In this second article on Rplicate Series, We will share to you one way to replicate the plot titled Cherry Bomb from the article “Japan’s cherry blossoms are emerging increasingly early, 7th April 2017”. The raw dataset used for this graph was obtained from a phenological dataset that was previously collected by Dr. Yasuyuki Aono from Osaka Prefecture University12.

While replicating the plot, we will also learn how to:
- various data pre-processing tricks such as separate a column into several columns, create date data from several columns, and reverse levels of a categorical column
- create a scatterplot with a custom shape/image as points
- using
geom_smooth()
for visualizing trend in the data - explore facets and various theme editing on ggplot2
Let’s dive in below!
Load Packages
These are the basic packages that we will use. Along the way, there will be some packages that we will load just before its use to better understand and remember its functionality.
library(dplyr) # to simplify data transformation
library(tidyr) # to tidying data
library(lubridate) # to deal with date
library(ggplot2) # to create plot
Import Dataset
The first thing we have to do is to load the dataset into R.
sakura <- read.csv("data_input/sakura.csv")
tail(sakura)
#> AD full.flowering.doy full.flowering.date source data.type
#> 1213 2013 93 403 8 0
#> 1214 2014 94 404 8 0
#> 1215 2015 93 403 8 0
#> 1216 2016 NA 404 8 0
#> 1217 2017 NA 409 8 0
#> 1218 2018 NA 330 8 0
#> reference
#> 1213 NEWS-PAPER(ARASHIYAMA)
#> 1214 NEWS-PAPER(ARASHIYAMA)
#> 1215 NEWS-PAPER(ARASHIYAMA)
#> 1216 NEWS-PAPER(ARASHIYAMA)
#> 1217 NEWS-PAPER(ARASHIYAMA)
#> 1218 NEWS-PAPER(ARASHIYAMA)
These are the descriptions of each column:
- AD: year in A.D.
- full.flowering.doy: full flowering day of year
- full.flowering.date: full flowering date (eg.: 401 = April 1st)
- source: source code
- data.type: data source code
- reference: name of old document/references
Data Wrangling
The next step is data wrangling or the process of tidying and transforming data from a raw dataset into another format that is easier for plotting.
# selecting column for plotting
sakura <- sakura %>%
select(AD, full.flowering.date)
sample_n(sakura,6) # sampling rows
#> AD full.flowering.date
#> 1 1367 407
#> 2 1589 415
#> 3 1957 413
#> 4 995 NA
#> 5 939 NA
#> 6 1590 412
After some exploration, we find that most of the data came from early AD contain missing records of its full flowering date. Therefore, the next step is to obtain the data with no missing values on its full.flowering.date
, and for better visualization, we can also rename the “AD” column to “Year”.
sakura.used <- sakura %>%
filter(full.flowering.date != "NA") %>%
rename(Year=AD)
head(sakura.used)
#> Year full.flowering.date
#> 1 812 401
#> 2 815 415
#> 3 831 406
#> 4 851 418
#> 5 853 414
#> 6 864 409
If you see the data above, the month and day of month information is still stored in one column full.flowering.date
. Meanwhile, we need both information separately to display it on the y-axis. Moreover, it is still stored in an integer format with no separator between both information. We will need to separate both information into different columns. To do that, we can use the separate()
function, as seen below:
sakura.plot <- sakura.used %>%
separate(full.flowering.date,
into = c("month", "day"), # name of the new columns
sep = -2) # separate from the last 2 digits
head(sakura.plot)
#> Year month day
#> 1 812 4 01
#> 2 815 4 15
#> 3 831 4 06
#> 4 851 4 18
#> 5 853 4 14
#> 6 864 4 09
Now, we only need to remove the leading zero of some digits in the day
column. We can use the function f_num()
from numform package. The numform package contains various functions to format numbers for publication.
library(numform)
sakura.plot <- sakura.plot %>%
mutate(day = f_num(day,
digits = 0)) # to set for 0 number behind decimal point
head(sakura.plot)
#> Year month day
#> 1 812 4 1
#> 2 815 4 15
#> 3 831 4 6
#> 4 851 4 18
#> 5 853 4 14
#> 6 864 4 9
The next step we are going to do is to combine year, month, and day information into a full date format. Let’s store it into a column named date
. After that, let’s replace the month value into its respective month label. We can also change the data type of column day
into integers, or just replace it with the day information obtained from the column date
.
sakura.plot <- sakura.plot %>%
mutate(date = as.Date(paste(Year, month, day, sep = "-")),
month = month(date, label = T, abbr = F),
day = day(date))
head(sakura.plot)
#> Year month day date
#> 1 812 April 1 0812-04-01
#> 2 815 April 15 0815-04-15
#> 3 831 April 6 0831-04-06
#> 4 851 April 18 0851-04-18
#> 5 853 April 14 0853-04-14
#> 6 864 April 9 0864-04-09
unique(sakura.plot$month)
#> [1] April March May
#> 12 Levels: January < February < March < April < May < June < ... < December
If we look at the order of our month
column, it is still in the reverse format. We can reverse the order of the levels by using the function fct_rev()
from forcats package. Forcats package itself contains many functions form working with categorical variables/factors.
sakura.plot <- sakura.plot %>%
mutate(month = forcats::fct_rev(month))
unique(sakura.plot$month)
#> [1] April March May
#> 12 Levels: December < November < October < September < August < ... < January
We have now obtained the data ready for plotting.
Create Plot
In this section, we will display the code for plotting followed by its result below. Try tracking on what has been added to the code and how it gives changes to the plot!
The Canvas
# blank canvas
p <- ggplot(sakura.plot, aes(x = Year, y = day))
p
Add Facets
# add facets_grid by month, horizontally
p + facet_grid(month~.)
# customize facets
p2 <- p + facet_grid(month~.,
scales = "free", # free scales on both x & y axis
space = "free", # panel's height & width will vary based on data
switch = "y") # facet label to be displayed on th right
p2
Add geom_point
The basic idea is to create a scatterplot from our data.
# basic idea
p2 + geom_point()
But let’s make it more interesting by changing the point shape into a custom cherry blossom image. There are several ways to add custom shape as point (scatterplot) in ggplot:
1. Use extended ASCII symbol in the argument shape
The standard shape in geom_point only has options from shape 1-25. The extended number is actually options for various ASCII characters. We can use the ASCII character number 42 to obtain a flower-like shape as seen below.
p3 <- p2 + geom_point(shape = 42, size = 5, color = "#f64b77")
p3
2. Add image as point using geom_image()
We can use geom_image()
from the package ggimage to add an image as a point in our plot. For that to work, we need to store the custom image into a column in our data. In this demo, we will use an icon made by Freepik from Flaticon. This is our icon image that has been saved locally:

# adding new column to store image
sakura.plot2 <- sakura.plot %>%
mutate(image = "img/cherry.png")
# create plot
library(ggimage)
ggplot(sakura.plot2, aes(x = Year, y = day)) +
facet_grid(month~., scales = "free", space = "free_y", switch = "y") +
geom_image(aes(image = image))
The latter option seems to be more flexible, but unfortunately, the image cannot be displayed properly due to the small panel size (result from setting space = "free"
in facets). For that reason, we will use the first option instead.
Add geom_smooth
Below we will add a smooth line using geom_smooth()
. Because we also need legend for our smooth line, we can specify the color and fill label within aes()
.
p4 <- p3 +
# make trend line
geom_smooth(aes(fill = "Trend"), span = 0.1, se = FALSE, color = "#644128") +
# make confidence interval with no trend line
geom_smooth(aes(color = "Confidence Interval"), span = 0.1, fill = "#a56c56", linetype = 0)
p4
Costumize Axis
We can specify our axis manually for our plot that (unfortunately) did not follow a specific sequence.
p5 <- p4 +
scale_x_continuous(
limits = c(812,2020),
breaks = c(800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2016),
labels = c("800", "", "1000", "", "1200", "", "1400", "", "1600", "", "1800", "", "", "2016")) +
scale_y_continuous(
breaks = c(1, 10, 20),
labels = c("1st", "10th", "20th"))
p5
Add Labels
We can add labels and make some of them italics using the function expression()
:
p6 <- p5 +
labs(title = "Cherry Bomb",
subtitle = "Date of cherry-blossom peak-bloom in Kyoto, Japan, 800AD - 2016",
x = expression(italic("Year")),
y = expression(italic("Date of cherry-blossom peak-bloom")),
caption = "Source: Yasuyuki Aono, Osaka Prefecture University")
p6
Costumize Theme
Last but not least is to custom our theme. Below is an extensive list of options we can use to custom our plot theme, and there are still many more. You can try to modify this code and see how it affects our plot to better understand its use. We can also use a custom font which we have discussed in the previous article “Rplicate Series: Happiness of The Third Age”.
library(extrafont)
# importing fonts to R
# font_import()
# load all fonts so it can be used
loadfonts(device = "win")
cherrybomb <- p6 + theme(
# general
text = element_text(family = "Segoe UI"),
# panel
panel.background = element_rect(fill = "white"),
panel.grid.major.y = element_line(color = "#a56c56",
linetype = "solid"),
panel.grid.major.x = element_blank(),
panel.grid.minor = element_blank(),
panel.spacing = unit(0.1, "cm"),
# axis
axis.line.x = element_line(colour = "black"),
axis.line.y = element_blank(),
axis.text = element_text(size = 8),
axis.title = element_text(size = 8),
# legends
legend.position = c(0.81, 1),
legend.key = element_blank(),
legend.box = "horizontal",
legend.box.spacing = unit(1, "mm"),
legend.title = element_blank(),
legend.text = element_text(size = 8),
legend.background = element_blank(),
# labs
plot.title = element_text(hjust = 0,
face = "bold",
size = 12),
plot.subtitle = element_text(hjust = 0,
face = "plain",
size = 10),
plot.caption = element_text(size = 8,
colour = "#B3B1B1",
hjust = 0),
# facets title
strip.placement = "outside",
strip.background = element_rect(fill = "#e8dbd6"),
strip.switch.pad.grid = unit(0.2, "cm"),
strip.text = element_text(size = 7))
And here is our final Cherry Bomb Plot!
cherrybomb
Additional Notes
This graph was both fun and challenging to make. Unfortunately, we have not managed to replicate one minor detail. If you take a more detailed look at the plot, we still have an uncomplete label for the tickmarks in the y-axis. I find it difficult to create different y-axis tickmarks and their labels for each facet in the plot. It is very unfortunate that the facet_grid function is not yet provided with the ability to have/set different scales/limits and breaks for each facet. And yet, I also haven’t found a way to assign different breaks and limits for each facet. Thankfully the communities are developing it right now. For more info, you can click here.
Thank you for reading and we hope this article can help you to find the delight in exploring data visualization in R. Happy learning and coding!