In this third article on Rplicate Series, We will share to you one way to replicate The Economist plot titled “I can’t get no …”. The plot making will also demonstrate ways to converting smooth line into linear line, using guides for legend, and adding a text annotation to a plot. The raw dataset used for this graph was obtained from a PDF, so we will also use tibble package to copy-paste the data into R.
Let’s dive in below!
Load Packages
These are the basic packages that we will use. Along the way, there will be some packages that we will load just before its use to better understand and remember its functionality.
library(dplyr)
library(ggplot2)
Dataset
Data Collection
The data used in this plot is sourced from a PDF file. Fortunately, we can literally copy-paste the data using tribble()
from tibble package. Click here for more detailed documentation of tribble.
library(tibble)
data1 <- tribble(
~V1, ~V2, ~V3, ~V4, ~V5,
"United States", 47L, 58L, 56L, 61L,
"Canada", 73L, 79L, 78L, 82L,
"France", 47L, 54L, 62L, 73L,
"Germany", 72L, 77L, 75L, 85L,
"Greece", 18L, 25L, 22L, 42L,
"Hungary", 42L, 49L, 54L, 63L,
"Italy", 36L, 46L, 45L, 60L,
"Netherlands", 74L, 82L, 82L, 89L,
"Poland", 40L, 43L, 43L, 59L,
"Spain", 33L, 48L, 48L, 63L,
"Sweden", 66L, 78L, 78L, 86L,
"UK", 52L, 64L, 63L, 74L,
"Russia", 55L, 68L, 60L, 79L,
"Australia", 48L, 60L, 57L, 65L,
"India", 65L, 72L, 80L, 72L,
"Indonesia", 77L, 85L, 85L, 89L,
"Japan", 55L, 55L, 65L, 74L,
"Philippines", 78L, 83L, 86L, 87L,
"South Korea", 27L, 26L, 36L, 44L,
"Vietnam", 78L, 78L, 80L, 85L,
"Israel", 42L, 50L, 63L, 78L,
"Jordan", 46L, 47L, 55L, 60L,
"Lebanon", 36L, 62L, 48L, 73L,
"Tunisia", 48L, 52L, 56L, 58L,
"Turkey", 57L, 73L, 65L, 74L,
"Ghana", 68L, 70L, 77L, 81L,
"Kenya", 70L, 75L, 80L, 82L,
"Nigeria", 67L, 68L, 71L, 74L,
"Senegal", 75L, 75L, 72L, 84L,
"South Africa", 65L, 69L, 73L, 76L,
"Tanzania", 83L, 89L, 93L, 92L,
"Argentina", 37L, 38L, 45L, 56L,
"Brazil", 45L, 54L, 57L, 66L,
"Chile", 36L, 36L, 42L, 55L,
"Colombia", 38L, 43L, 41L, 57L,
"Mexico", 58L, 55L, 62L, 68L,
"Peru", 42L, 42L, 51L, 59L,
"Venezuela", 52L, 50L, 58L, 62L
)
data2 <- tribble(
~V1, ~V2, ~V3, ~V4, ~V5, ~V6, ~V7, ~V8,
"United States", "Spring, 2017", 15L, 36L, 24L, 23L, 2L, 100L,
"Canada", "Spring, 2017", 20L, 47L, 18L, 14L, 1L, 100L,
"France", "Spring, 2017", 3L, 17L, 36L, 42L, 1L, 100L,
"Germany", "Spring, 2017", 26L, 43L, 19L, 10L, 1L, 100L,
"Greece", "Spring, 2017", 1L, 12L, 26L, 60L, 1L, 100L,
"Hungary", "Spring, 2017", 9L, 48L, 25L, 16L, 2L, 100L,
"Italy", "Spring, 2017", 1L, 25L, 47L, 26L, 2L, 100L,
"Netherlands", "Spring, 2017", 24L, 47L, 20L, 9L, 1L, 100L,
"Poland", "Spring, 2017", 14L, 28L, 31L, 24L, 4L, 100L,
"Spain", "Spring, 2017", 5L, 12L, 40L, 41L, 1L, 100L,
"Sweden", "Spring, 2017", 15L, 52L, 23L, 10L, 0L, 100L,
"United Kingdom", "Spring, 2017", 14L, 35L, 28L, 20L, 4L, 100L,
"Russia", "Spring, 2017", 20L, 47L, 24L, 6L, 3L, 100L,
"Australia", "Spring, 2017", 7L, 41L, 29L, 21L, 2L, 100L,
"India", "Spring, 2017", 39L, 46L, 7L, 4L, 4L, 100L,
"Indonesia", "Spring, 2017", 30L, 53L, 15L, 1L, 0L, 100L,
"Japan", "Spring, 2017", 6L, 51L, 33L, 7L, 3L, 100L,
"Philippines", "Spring, 2017", 36L, 44L, 16L, 4L, 0L, 100L,
"South Korea", "Spring, 2017", 2L, 21L, 50L, 25L, 2L, 100L,
"Vietnam", "Spring, 2017", 31L, 51L, 16L, 1L, 1L, 100L,
"Israel", "Spring, 2017", 14L, 37L, 33L, 14L, 2L, 100L,
"Jordan", "Spring, 2017", 13L, 35L, 30L, 18L, 3L, 100L,
"Lebanon", "Spring, 2017", 2L, 13L, 42L, 43L, 1L, 100L,
"Tunisia", "Spring, 2017", 11L, 18L, 14L, 56L, 2L, 100L,
"Ghana", "Spring, 2017", 51L, 19L, 13L, 13L, 4L, 100L,
"Kenya", "Spring, 2017", 28L, 40L, 20L, 12L, 1L, 100L,
"Nigeria", "Spring, 2017", 34L, 20L, 21L, 24L, 1L, 100L,
"Senegal", "Spring, 2017", 37L, 23L, 22L, 16L, 2L, 100L,
"South Africa", "Spring, 2017", 22L, 22L, 28L, 24L, 3L, 100L,
"Tanzania", "Spring, 2017", 48L, 41L, 5L, 4L, 1L, 100L,
"Argentina", "Spring, 2017", 10L, 12L, 41L, 34L, 3L, 100L,
"Brazil", "Spring, 2017", 2L, 22L, 18L, 56L, 3L, 100L,
"Chile", "Spring, 2017", 5L, 10L, 51L, 32L, 2L, 100L,
"Colombia", "Spring, 2017", 7L, 5L, 51L, 36L, 1L, 100L,
"Mexico", "Spring, 2017", 2L, 15L, 39L, 43L, 1L, 100L,
"Peru", "Spring, 2017", 4L, 8L, 55L, 31L, 2L, 100L,
"Venezuela", "Spring, 2017", 14L, 15L, 24L, 45L, 2L, 100L
)
# adding colnames to data1
colnames(data1) <- c("Country","Political_issues", "News about govt. leaders and officials","News accurately","Most important news events")
# adding colnames to data2
colnames(data2) <- c("Country","x", "A lot", "Somewhat", "Not much", "Not at all", "DK/Refused", "Total")
Below is a glimpse of our data:
head(data1, 3)
#> # A tibble: 3 x 5
#> Country Political_issues `News about govt.~ `News accuratel~ `Most important~
#> <chr> <int> <int> <int> <int>
#> 1 United ~ 47 58 56 61
#> 2 Canada 73 79 78 82
#> 3 France 47 54 62 73
head(data2, 3)
#> # A tibble: 3 x 8
#> Country x `A lot` Somewhat `Not much` `Not at all` `DK/Refused` Total
#> <chr> <chr> <int> <int> <int> <int> <int> <int>
#> 1 United St~ Spring~ 15 36 24 23 2 100
#> 2 Canada Spring~ 20 47 18 14 1 100
#> 3 France Spring~ 3 17 36 42 1 100
Feature Selection
There are some features that we need to add and also select for visualization. But before that, we should first join our data. Here, we can join our data using the function merge()
from base R.
# join data
dataset <- merge(data1, data2, by = "Country")
# feature engineering `gov`
data_f <- dataset %>%
mutate(gov = `A lot` + Somewhat) %>%
select(Country,Political_issues,gov)
head(data_f, 3)
#> Country Political_issues gov
#> 1 Argentina 37 22
#> 2 Australia 48 48
#> 3 Brazil 45 24
# preparing region-country labels
Asia_Oceania <- c("Australia", "India", "Indonesia", "Japan",
"Philippines", "South Korea", "Vietnam")
Europe <- c("France", "Germany", "Greece", "Hungary", "Italy",
"Netherlands", "Poland", "Spain", "Sweden", "UK", "Russia")
MidEast_NAfrica <- c("Israel", "Jordan", "Lebanon",
"Tunisia", "Turkey")
SouthAmerica <- c("Argentina", "Brazil", "Chile",
"Colombia", "Mexico", "Peru", "Venezuela")
SubSaharaAfrica <- c("Ghana", "Kenya", "Nigeria",
"Senegal", "South Africa", "Tanzania")
NorthAmerica <- c("United States", "Canada")
# feature engineering `region`
data_f <- data_f %>%
mutate(region = ifelse(data_f$Country %in% Asia_Oceania, 'Asia & Oceania',
ifelse(data_f$Country %in% Europe, 'Europe',
ifelse(data_f$Country %in% MidEast_NAfrica, 'Middle East & North Africa',
ifelse(data_f$Country %in% SouthAmerica, 'South America',
ifelse(data_f$Country %in% SubSaharaAfrica, 'Sub-Sahara Africa',
ifelse(data_f$Country %in% NorthAmerica, 'North America',
NA)))))))
# quick check
head(data_f, 3)
#> Country Political_issues gov region
#> 1 Argentina 37 22 South America
#> 2 Australia 48 48 Asia & Oceania
#> 3 Brazil 45 24 South America
Plotting
Before making anything, we can first load fonts from our locale to R using extrafont package. This will allow R to use custom fonts in its plot. Discussions about using custom font can be found in the previous Rplicate Series articles.
library(extrafont)
# importing fonts to R
# font_import()
# load all fonts so it can be used
loadfonts(device = "win")
Basic Plot
We can first make the Basic Plot using the code below:
g1 <- ggplot(data = data_f,
aes(x = gov, y = Political_issues)) +
geom_point(aes(colour = region, fill = region), # we will edit the color & fill manually later
shape = 21, size = 3.5, stroke = 1) +
scale_x_continuous(breaks = seq(0,100, by=20), limits = c(0,100)) +
scale_y_continuous(breaks = seq(0,100, by=20), limits = c(0,100),
expand = expansion(add = c(0, 5))) # adjusting spaces on the min-max y-axis)
g1
Add Linear Line
This is the more interesting part of making the plot. When using geom_smooth()
, we usually aim to display a trend line following pattern in our data and ended up obtaining a slightly curved line. Little that we know, this function can also be used for making a linear line, we only need to adjust its parameter method
on how to make the trend line using a specific algorithm.
g2 <- g1 +
geom_smooth(aes(group = 1), # making 1 line
method = "lm", # making a linear plot (using linear model)
# formula = y ~ x, # we can also edit the linear model formula manually here
se = FALSE, # not displaying standard error
color = "#77200b") # color of line
g2
Now, the trend line was made but it disturb our view on the other geoms. It is best to rearrange the geoms so that the trend line positioned in the first layer of the plot. The combined codes from the previous steps are below and we will use it for further process.
g2 <- ggplot(data = data_f,
aes(x = gov, y = Political_issues)) +
geom_smooth(aes(group = 1),
method = "lm",
se = FALSE,
color = "#77200b") +
geom_point(aes(colour = region, fill = region),
shape = 21, size = 3.5, stroke = 1) +
scale_x_continuous(breaks = seq(0,100, by=20), limits = c(0,100)) +
scale_y_continuous(breaks = seq(0,100, by=20), limits = c(0,100),
expand = expansion(add = c(0, 5)))
g2
Add Repelled Text
Just like in the previous “Rplicate Series: Happiness of The Third Age”, we can add repelled text label using geom_text_repel()
from package ggrepel.
library(ggrepel)
g3 <- g2 + geom_text_repel(aes(label = Country), force = 2,
color = "grey20", size = 3, family = "Segoe UI")
g3
Add Labels & Title
g4 <- g3 +
labs(title = "I can't get no...",
subtitle = "Trust in goverment and satifaction with the media, 2017",
x = "Trust the national goverment to do what is right for our country, % of respondents",
y = "Satisfied with the news media in our country*, % of respondents",
caption = "Source: Pew Research Centre",
color = NULL,
fill = NULL) # removing legend title
g4
Custom Point Colour
To further replicate the original plot, we can also set colors and fill manually for points using scale_color/fill_manual()
. Note that to do that, we need to specify a point shape that consist of area and border, such as shape 21. In the function, we can add values of the color/fill and labels for its respective color. We should also note that the order is based on the original level order in the data.
g5 <- g4 + scale_color_manual(values = c("#DE8321","#1D84C6","#0B743D",
"#000000", "#DA3411","#04452D"),
# we can provide custom label as well, respected to the data
labels = c("Asia &\nOceania", "Europe","Middle East &\nNorth Africa",
"North\nAmerica", "South America", "Sub-Sahara Africa")
) +
scale_fill_manual(values = c("#EB983F","#3FCEEB","#3FB879",
"#E33010", "#FFFFFF","#0C8E5E"),
labels = c("Asia &\nOceania", "Europe","Middle East &\nNorth Africa",
"North\nAmerica", "South America", "Sub-Sahara Africa")
)
g5
Customize Theme
Next, we can tidy our plot using themes using the codes below.
g6 <- g5 + theme(
# general
text = element_text(family = "Segoe UI"),
# legend
legend.key.size = unit(0.1, "lines"), # set space between keys
legend.key = element_rect(fill = NA), # removing backround of keys
legend.text = element_text(size = 10),
legend.position = "top",
legend.direction = "horizontal",
# axis
axis.title = element_text(face = "italic", size = 10),
# plot (all plot area including title, axis, etc.)
plot.title = element_text(face = "bold", size = 15),
plot.subtitle = element_text(size = 10,
# adding some space between subtitle & legend
margin = margin(t =0,
r = 0,
b = 10,
l = 0,
unit = "pt")),
plot.caption = element_text(size = 8, hjust = 0, vjust = 0.5),
# panel (area where we map our geoms)
panel.border = element_blank(),
panel.background = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_line(size = 0.1, color = "grey66"),
panel.grid.minor = element_blank()
)
g6
Guides & Annotation
To have more flexibility in editing plot features such as legend, we can use guides()
from ggplot2. There are various graphical features which you can apply guides into such as guide_axis()
, guide_colorbar()
, guide_geoms()
, etc. We encourage you to visit the official ggplot2 documentation and explore it according to your needs on the net. Here, we will only use guide_legend()
to custom our legend so that it will only display 1 row.
g7 <- g6 + guides(color = guide_legend(nrow = 1))
g7
Last but not least, we can add text annotation into our plot using the function annotate()
from ggplot2. There are various geoms you can annotate inside a plot, such as a rectangle, an arrow, a segment, etc. For a friendly introduction, you can refer to R Graph Gallery – Annotations or to the official ggplot2 documentation.
Here, we will add annotations for the geometrical element segment
and text
for our trend line legend.
g8 <- g7 + annotate(geom = "segment", # making a line segment
x = 67, xend = 69, y = 13, yend = 13, # position of segment
colour = "#77200b",
size = 1
) +
annotate(geom = "text", # making a text
x = 70, y = 10, # position of text
label = "Line of best fit,\ncorrelation coefficient = 87%",
fontface = "italic", color = "#77200b", size = 4, family = "Segoe UI",
hjust = 0) # set the text on the right side of segment
Final Plot
Now let’s display our final plot! We can display our plot with the desired height and width by specifying it in the R chunk setting. Below is the plot created with an R Markdown chunk setting fig.height = 6, fig.width = 7
.
g8
Thank you for reading and we hope this article can help you to create a more interesting visualization. We hope you can find the delight in exploring data visualization in R. Happy learning and coding!