Welcome again to the Rplicate Series! In this 5th article of the series. On this occassion, we will try to replicate the first interactive plot from the FiveThirtyEight article titled How Americans View The Coronavirus Crisis And Trump’s Response. This time you’ll learn how to build an interactive plot using highcharter
.
Library and Setup
Below is the required package that we will use during data wrangling and chart creation.
# Data Wrangling
library(tidyverse)
library(lubridate)
# Visualization
library(highcharter)
library(scales)
Data Collection
All data is available directly from the article below the plot. You can directly download them or visit this link.
Concern Top Line
We load the data that correspond to the line for each survey responses. On the survey, people can choose between four different responses: Not At All, Not Very, Somewhat, and Very for both their concern about economic and the viral infection.
concern_topline <- read.csv("data_input/covid/covid_concern_toplines.csv")
head(concern_topline, 10)
## subject modeldate party very_estimate somewhat_estimate
## 1 concern-infected 12/1/2020 all 34.10675 33.67834
## 2 concern-economy 12/1/2020 all 54.11890 31.45103
## 3 concern-infected 11/30/2020 all 34.10675 33.67834
## 4 concern-economy 11/30/2020 all 54.11890 31.45103
## 5 concern-infected 11/29/2020 all 34.10675 33.67834
## 6 concern-economy 11/29/2020 all 54.11890 31.45103
## 7 concern-infected 11/28/2020 all 34.18683 34.03282
## 8 concern-economy 11/28/2020 all 54.11890 31.45103
## 9 concern-infected 11/27/2020 all 34.18683 34.03282
## 10 concern-economy 11/27/2020 all 54.11890 31.45103
## not_very_estimate not_at_all_estimate timestamp
## 1 18.334785 12.077454 09:15:33 1 Dec 2020
## 2 9.104874 3.647811 09:15:30 1 Dec 2020
## 3 18.334785 12.077454 09:11:03 1 Dec 2020
## 4 9.104874 3.647811 09:10:55 1 Dec 2020
## 5 18.334785 12.077454 09:11:07 1 Dec 2020
## 6 9.104874 3.647811 09:10:57 1 Dec 2020
## 7 18.338019 11.871891 20:50:24 28 Nov 2020
## 8 9.104874 3.647811 20:50:21 28 Nov 2020
## 9 18.338019 11.871891 20:46:03 28 Nov 2020
## 10 9.104874 3.647811 20:45:55 28 Nov 2020
In this part we will convert the data and do the following process:
- Convert
modeldate
into date format - Filter data to only concern about infection
- Create long format table from the four responses
- Tidy the responses category name
- Prepare the output for the chart tooltip
- Create a new column that transform
modeldate
into proper timestamp format forhighcharter
df_concern <- concern_topline %>%
# convert modeldate into time format
mutate(
modeldate= mdy(modeldate)
) %>%
arrange(modeldate) %>%
# Filter data to only concern about infection
filter(subject %>% str_detect("infect")) %>%
select(-c(subject, timestamp, party)) %>%
# Create long format table from the four responses
pivot_longer(-modeldate) %>%
mutate(
# Tidy the responses category name
name = name %>%
str_remove_all("_estimate") %>%
str_replace_all("_", " ") %>%
str_to_title(),
# Prepare the output for the chart tooltip
tooltip = paste0(name, ": ", number(value, accuracy = 0.01, suffix = "%")),
# Create timestamp for chart
timestamp = datetime_to_timestamp(modeldate)
)
head(df_concern, 10)
## # A tibble: 10 x 5
## modeldate name value tooltip timestamp
## <date> <chr> <dbl> <chr> <dbl>
## 1 2020-02-15 Very 11.8 Very: 11.76% 1581724800000
## 2 2020-02-15 Somewhat 26.3 Somewhat: 26.28% 1581724800000
## 3 2020-02-15 Not Very 39.1 Not Very: 39.08% 1581724800000
## 4 2020-02-15 Not At All 20.0 Not At All: 20.00% 1581724800000
## 5 2020-02-16 Very 16.4 Very: 16.43% 1581811200000
## 6 2020-02-16 Somewhat 24.8 Somewhat: 24.81% 1581811200000
## 7 2020-02-16 Not Very 37.8 Not Very: 37.80% 1581811200000
## 8 2020-02-16 Not At All 19.7 Not At All: 19.74% 1581811200000
## 9 2020-02-17 Very 16.4 Very: 16.43% 1581897600000
## 10 2020-02-17 Somewhat 24.8 Somewhat: 24.81% 1581897600000
Survey Polls
We will load the Survey Polls regarding poeple concern about COVID-19.
covid_concern <- read.csv("data_input/covid/covid_concern_polls.csv")
glimpse(covid_concern)
## Rows: 595
## Columns: 15
## $ start_date <chr> "2020-01-27", "2020-01-31", "2020-02-02", "2020-02-07",...
## $ end_date <chr> "2020-01-29", "2020-02-02", "2020-02-04", "2020-02-09",...
## $ pollster <chr> "Morning Consult", "Morning Consult", "YouGov", "Mornin...
## $ sponsor <chr> "", "", "Economist", "", "Huffington Post", "Economist"...
## $ sample_size <int> 2202, 2202, 1500, 2200, 1000, 1500, 1074, 1207, 1207, 1...
## $ population <chr> "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", ...
## $ party <chr> "all", "all", "all", "all", "all", "all", "all", "all",...
## $ subject <chr> "concern-economy", "concern-economy", "concern-infected...
## $ tracking <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,...
## $ text <chr> "How concerned are you that the coronavirus will impact...
## $ very <dbl> 19, 26, 13, 23, 11, 11, 22, 22, 22, 10, 10, 32, 32, 41,...
## $ somewhat <dbl> 33, 32, 26, 32, 24, 28, 23, 35, 21, 28, 30, 37, 39, 37,...
## $ not_very <dbl> 23, 25, 43, 24, 33, 39, 37, 28, 33, 42, 40, 18, 17, 14,...
## $ not_at_all <dbl> 11, 7, 18, 9, 20, 22, 19, 15, 23, 19, 20, 6, 5, 5, 16, ...
## $ url <chr> "https://morningconsult.com/wp-content/uploads/2020/02/...
There are two type of polls, concern related to economy and concern related to health risk and safety. We will use the later one.
concern_infection <- covid_concern %>%
filter(subject == "concern-infected")
We will do similar step to prepare the long format version of the data. For the date we will use the end_date
of each survey. If the sponsor the survey is missing/NA, we will italic the data. The <i>
symbol indicates that we will italic the text.
df_infect <- concern_infection %>%
mutate_at(vars(contains("date")), ymd) %>%
select(end_date, very, somewhat, not_very, not_at_all, pollster, sponsor) %>%
pivot_longer(-c(end_date, pollster, sponsor)) %>%
mutate(name = name %>%
str_remove_all("_estimate") %>%
str_replace_all("_", " ") %>%
str_to_title(),
sponsor = ifelse(sponsor == "", "<i>NA</i>", sponsor)
)
df_infect <- df_infect %>%
filter(end_date >= min(df_concern$modeldate)) %>%
mutate(timestamp = datetime_to_timestamp(end_date))
df_infect
## # A tibble: 1,612 x 6
## end_date pollster sponsor name value timestamp
## <date> <chr> <chr> <chr> <dbl> <dbl>
## 1 2020-02-16 AP-NORC <i>NA</i> Very 22 1581811200000
## 2 2020-02-16 AP-NORC <i>NA</i> Somewhat 23 1581811200000
## 3 2020-02-16 AP-NORC <i>NA</i> Not Very 37 1581811200000
## 4 2020-02-16 AP-NORC <i>NA</i> Not At All 19 1581811200000
## 5 2020-02-18 Kaiser Family Foundation <i>NA</i> Very 22 1581984000000
## 6 2020-02-18 Kaiser Family Foundation <i>NA</i> Somewhat 21 1581984000000
## 7 2020-02-18 Kaiser Family Foundation <i>NA</i> Not Very 33 1581984000000
## 8 2020-02-18 Kaiser Family Foundation <i>NA</i> Not At All 23 1581984000000
## 9 2020-02-18 YouGov Economist Very 10 1581984000000
## 10 2020-02-18 YouGov Economist Somewhat 28 1581984000000
## # ... with 1,602 more rows
Event
As the vertical line that notify important event, we will manually create the data.frame with information gained from the original plot.
tribble(~event, ~date,
"First U.S. death reported", "2020-02-29",
"U.S. deaths surpass 10,000", "2020-04-06",
"U.S. deaths surpass 100,000", "2020-05-28",
"Trump diagnosed with COVID-19", "2020-10-2"
)
## # A tibble: 4 x 2
## event date
## <chr> <chr>
## 1 First U.S. death reported 2020-02-29
## 2 U.S. deaths surpass 10,000 2020-04-06
## 3 U.S. deaths surpass 100,000 2020-05-28
## 4 Trump diagnosed with COVID-19 2020-10-2
Here, I directly transform the event data into HTML format to suite the text shown in the original plot. The date are also transformed into timestamp so they will properly positioned in the highcharter
x-axis.
<i>
: italic<br>
: line break
event_date <- tribble(~event, ~date,
"<i>First U.S.<br><i>death<br><i>reported</i>", "2020-02-29",
"<i>U.S. deaths<br><i>surpass<br><i>10,000</i>", "2020-04-06",
"<i>U.S. deaths<br><i>surpass<br><i>100,000</i>", "2020-05-28",
"<i>Trump<br><i>diagnosed<br><i>with<br><i>COVID-19</i>", "2020-10-2"
) %>%
mutate(timestamp = ymd(date) %>% datetime_to_timestamp())
event_date
## # A tibble: 4 x 3
## event date timestamp
## <chr> <chr> <dbl>
## 1 <i>First U.S.<br><i>death<br><i>reported</i> 2020-02-29 1.58e12
## 2 <i>U.S. deaths<br><i>surpass<br><i>10,000</i> 2020-04-06 1.59e12
## 3 <i>U.S. deaths<br><i>surpass<br><i>100,000</i> 2020-05-28 1.59e12
## 4 <i>Trump<br><i>diagnosed<br><i>with<br><i>COVID-19</i> 2020-10-2 1.60e12
Prepare Color
Here we will prepare the color of each responses category. For the scatter plot, we will use transparent color by transforming the color using hex_to_rgba()
and set the transparency into 0.2. You can use the colorpick eyedropper extension from Google Chrome if you want to get the color yourself.
The color_group
will be used to indicate the color both for the line chart and the scatter plot.
df_color <- data.frame(name = unique(df_concern$name),
color = c("#F56B38", "#F9AA87", "#DAAFD6", "#BF7CCC")
) %>%
mutate(color_opaque = hex_to_rgba(color, 0.2))
df_concern <- df_concern %>%
left_join(df_color)
df_infect <- df_infect %>%
left_join(df_color)
# Final Color Scheme
scatter_color <- df_color$color_opaque[4:1]
line_color <- df_color$color[4:1]
color_group <- c(line_color, scatter_color)
Tooltip
Finally, we create the tooltip for each data. For the line chart, we set the font size to 14 pixels using font type of Roboto Slab
(you can see all available font from Google Font API). For the scatter plot, we will show the responses and the percentage, the pollster and the sponsor. This part is different from the original plot that completely remove all tooltip from the scatter plot. All fonts from the Five Thirty Eight are commisioned fonts we can’t get them directly.
df_concern <- df_concern %>%
mutate(
tooltip = paste0("<span style='font-size: 14px; font-family: Roboto Slab; color: ",color,"'>", name, ": ", number(value, accuracy = 0.01, suffix = "%"), "</span>")
)
df_infect <- df_infect %>%
mutate(
tooltip = paste0("<span style='color: ",color,"'>", name, ": ", number(value, accuracy = 0.01, suffix = "%"), "</span><br>",
"Pollster: ", pollster, "<br>", "Sponsor: ", sponsor)
)
head(df_concern, 10)
## # A tibble: 10 x 7
## modeldate name value tooltip timestamp color color_opaque
## <date> <chr> <dbl> <chr> <dbl> <chr> <chr>
## 1 2020-02-15 Very 11.8 <span style='font-si~ 1.58e12 #F56B~ rgba(245,107~
## 2 2020-02-15 Somew~ 26.3 <span style='font-si~ 1.58e12 #F9AA~ rgba(249,170~
## 3 2020-02-15 Not V~ 39.1 <span style='font-si~ 1.58e12 #DAAF~ rgba(218,175~
## 4 2020-02-15 Not A~ 20.0 <span style='font-si~ 1.58e12 #BF7C~ rgba(191,124~
## 5 2020-02-16 Very 16.4 <span style='font-si~ 1.58e12 #F56B~ rgba(245,107~
## 6 2020-02-16 Somew~ 24.8 <span style='font-si~ 1.58e12 #F9AA~ rgba(249,170~
## 7 2020-02-16 Not V~ 37.8 <span style='font-si~ 1.58e12 #DAAF~ rgba(218,175~
## 8 2020-02-16 Not A~ 19.7 <span style='font-si~ 1.58e12 #BF7C~ rgba(191,124~
## 9 2020-02-17 Very 16.4 <span style='font-si~ 1.58e12 #F56B~ rgba(245,107~
## 10 2020-02-17 Somew~ 24.8 <span style='font-si~ 1.58e12 #F9AA~ rgba(249,170~
Visualization
Finally, we create the interactive chart using highcharter
. First, we will create the basic plot, including the scatter plot and the line chart. To remove the legend form the chart, we use the hc_legend()
and set the color using the hc_color
. The hcaes
is where you will include the information regarding the x-Axis, y-Axis, and the color grouping based on the responses category.
hchart(df_concern, "line", hcaes(x = modeldate, y = value, group = name), lineWidth = 4) %>%
hc_add_series(data = df_infect,
type = "scatter", hcaes(x = end_date, y = value, group = name)) %>%
hc_colors(colors = color_group) %>% # Adjust Color
hc_legend(enabled = F) # Remove Legend
Next, we add the setting for the chart title, subtitle, x-Axis and y-Axis. During the creation of the x-axis, we also include the vertical line that correspond to important events related to COVID-19 using the plotLines
.
hchart(df_concern, "line", hcaes(x = modeldate, y = value, group = name), lineWidth = 4) %>%
hc_add_series(data = df_infect,
type = "scatter", hcaes(x = end_date, y = value, group = name)) %>%
hc_colors(colors = color_group) %>%
hc_legend(enabled = F) %>%
hc_title(text = "How worried are Americans about infection?",
style = list(fontWeight = "bold", fontSize = "19px")
) %>%
hc_subtitle(text = "How concerned Americans say they are that they, someone in their family or someone else they know will<br>become infected with the coronavirus",
style = list(fontSize = "12px", color = "black", fontFamily = "Roboto Slab")) %>%
hc_yAxis(title = list(text = ""),
labels = list(style = list(fontFamily = "Roboto Slab", fontWeight = "bold", color = "#A2A2BF"),
formatter = JS("function(){return(this.value + '%')}")), # Create % format for y-axis
tickInterval= 25 # Interval of x-Axis (0, 25, 50, 75)
) %>%
hc_xAxis(title = list(text = ""),
labels = list(style = list(fontFamily = "Roboto Slab", color = "#A2A2BF", fontWeight = "bold")),
dateTimeLabelFormats = list(month = "%m/%d", week = "%m/%d"), # Format of the date label
plotLines = list(
list(value = event_date$date[1], color = "gray",
label = list(text = event_date$event[1], rotation = 0, align="center"),
zIndex = 1000, dashStyle = "longdash"),
list(value = event_date$date[2], color = "gray",
label = list(text = event_date$event[2], rotation = 0, align="center"),
zIndex = 1000, dashStyle = "longdash"),
list(value = event_date$date[3], color = "gray",
label = list(text = event_date$event[3], rotation = 0, align="center"),
zIndex = 1000, dashStyle = "longdash"),
list(value = event_date$date[4], color = "gray",
label = list(text = event_date$event[4], rotation = 0, align="center"),
zIndex = 1000, dashStyle = "longdash")
)
)
Finally, we add the setting for the tooltip. The tooltip is the most challenging part since the highcharter cannot fully replicate the tooltip from the original plot.
hchart(df_concern, "line", hcaes(x = modeldate, y = value, group = name), lineWidth = 4) %>%
hc_add_series(data = df_infect,
type = "scatter", hcaes(x = end_date, y = value, group = name)) %>%
hc_colors(colors = color_group) %>%
hc_legend(enabled = F) %>%
hc_title(text = "How worried are Americans about infection?",
style = list(fontWeight = "bold", fontSize = "19px")
) %>%
hc_subtitle(text = "How concerned Americans say they are that they, someone in their family or someone else they know will<br>become infected with the coronavirus",
style = list(fontSize = "12px", color = "black", fontFamily = "Roboto Slab")) %>%
hc_yAxis(title = list(text = ""),
labels = list(style = list(fontFamily = "Roboto Slab", fontWeight = "bold", color = "#A2A2BF"),
formatter = JS("function(){return(this.value + '%')}")),
tickInterval= 25 ) %>%
hc_xAxis(title = list(text = ""),
labels = list(style = list(fontFamily = "Roboto Slab", color = "#A2A2BF", fontWeight = "bold")),
dateTimeLabelFormats = list(month = "%m/%d", week = "%m/%d"),
plotLines = list(
list(value = event_date$date[1], color = "gray",
label = list(text = event_date$event[1], rotation = 0, align="center"),
zIndex = 1000, dashStyle = "longdash"),
list(value = event_date$date[2], color = "gray",
label = list(text = event_date$event[2], rotation = 0, align="center"),
zIndex = 1000, dashStyle = "longdash"),
list(value = event_date$date[3], color = "gray",
label = list(text = event_date$event[3], rotation = 0, align="center"),
zIndex = 1000, dashStyle = "longdash"),
list(value = event_date$date[4], color = "gray",
label = list(text = event_date$event[4], rotation = 0, align="center"),
zIndex = 1000, dashStyle = "longdash")
)
) %>%
hc_tooltip(crosshairs = TRUE,
backgroundColor = "white",
fillOpacity = 0.5,
shared = TRUE,
borderWidth = 0,
useHTML = T,
headerFormat = "<b><span style='font-size:16px; font-family: Roboto Slab; color: gray;'>{point.x: %b %d, %Y}</span></b><br>",
pointFormat = "<b>{point.tooltip}</b><br>"
)