Rplicate Series: Interactive Plot of Coronavirus Survey

Welcome again to the Rplicate Series! In this 5th article of the series. On this occassion, we will try to replicate the first interactive plot from the FiveThirtyEight article titled How Americans View The Coronavirus Crisis And Trump’s Response. This time you’ll learn how to build an interactive plot using highcharter.



Library and Setup

Below is the required package that we will use during data wrangling and chart creation.

# Data Wrangling
library(tidyverse)
library(lubridate)

# Visualization
library(highcharter)
library(scales)

Data Collection

All data is available directly from the article below the plot. You can directly download them or visit this link.

Concern Top Line

We load the data that correspond to the line for each survey responses. On the survey, people can choose between four different responses: Not At All, Not Very, Somewhat, and Very for both their concern about economic and the viral infection.

concern_topline <- read.csv("data_input/covid/covid_concern_toplines.csv")

head(concern_topline, 10)
##             subject  modeldate party very_estimate somewhat_estimate
## 1  concern-infected  12/1/2020   all      34.10675          33.67834
## 2   concern-economy  12/1/2020   all      54.11890          31.45103
## 3  concern-infected 11/30/2020   all      34.10675          33.67834
## 4   concern-economy 11/30/2020   all      54.11890          31.45103
## 5  concern-infected 11/29/2020   all      34.10675          33.67834
## 6   concern-economy 11/29/2020   all      54.11890          31.45103
## 7  concern-infected 11/28/2020   all      34.18683          34.03282
## 8   concern-economy 11/28/2020   all      54.11890          31.45103
## 9  concern-infected 11/27/2020   all      34.18683          34.03282
## 10  concern-economy 11/27/2020   all      54.11890          31.45103
##    not_very_estimate not_at_all_estimate            timestamp
## 1          18.334785           12.077454 09:15:33  1 Dec 2020
## 2           9.104874            3.647811 09:15:30  1 Dec 2020
## 3          18.334785           12.077454 09:11:03  1 Dec 2020
## 4           9.104874            3.647811 09:10:55  1 Dec 2020
## 5          18.334785           12.077454 09:11:07  1 Dec 2020
## 6           9.104874            3.647811 09:10:57  1 Dec 2020
## 7          18.338019           11.871891 20:50:24 28 Nov 2020
## 8           9.104874            3.647811 20:50:21 28 Nov 2020
## 9          18.338019           11.871891 20:46:03 28 Nov 2020
## 10          9.104874            3.647811 20:45:55 28 Nov 2020

In this part we will convert the data and do the following process:

  • Convert modeldate into date format
  • Filter data to only concern about infection
  • Create long format table from the four responses
  • Tidy the responses category name
  • Prepare the output for the chart tooltip
  • Create a new column that transform modeldate into proper timestamp format for highcharter
df_concern <- concern_topline %>% 
  
  # convert modeldate into time format
  mutate(
    modeldate= mdy(modeldate)
  ) %>% 
  arrange(modeldate) %>% 
  
  # Filter data to only concern about infection
  filter(subject %>% str_detect("infect")) %>% 
  select(-c(subject, timestamp, party)) %>% 
  
  # Create long format table from the four responses
  pivot_longer(-modeldate) %>% 
  mutate( 
    # Tidy the responses category name
    name = name %>% 
      str_remove_all("_estimate") %>% 
      str_replace_all("_", " ") %>% 
      str_to_title(), 
    
    # Prepare the output for the chart tooltip
    tooltip = paste0(name, ": ", number(value, accuracy = 0.01, suffix = "%")),
    
    # Create timestamp for chart
    timestamp = datetime_to_timestamp(modeldate)
    )

head(df_concern, 10)
## # A tibble: 10 x 5
##    modeldate  name       value tooltip                timestamp
##    <date>     <chr>      <dbl> <chr>                      <dbl>
##  1 2020-02-15 Very        11.8 Very: 11.76%       1581724800000
##  2 2020-02-15 Somewhat    26.3 Somewhat: 26.28%   1581724800000
##  3 2020-02-15 Not Very    39.1 Not Very: 39.08%   1581724800000
##  4 2020-02-15 Not At All  20.0 Not At All: 20.00% 1581724800000
##  5 2020-02-16 Very        16.4 Very: 16.43%       1581811200000
##  6 2020-02-16 Somewhat    24.8 Somewhat: 24.81%   1581811200000
##  7 2020-02-16 Not Very    37.8 Not Very: 37.80%   1581811200000
##  8 2020-02-16 Not At All  19.7 Not At All: 19.74% 1581811200000
##  9 2020-02-17 Very        16.4 Very: 16.43%       1581897600000
## 10 2020-02-17 Somewhat    24.8 Somewhat: 24.81%   1581897600000

Survey Polls

We will load the Survey Polls regarding poeple concern about COVID-19.

covid_concern <- read.csv("data_input/covid/covid_concern_polls.csv")

glimpse(covid_concern)
## Rows: 595
## Columns: 15
## $ start_date  <chr> "2020-01-27", "2020-01-31", "2020-02-02", "2020-02-07",...
## $ end_date    <chr> "2020-01-29", "2020-02-02", "2020-02-04", "2020-02-09",...
## $ pollster    <chr> "Morning Consult", "Morning Consult", "YouGov", "Mornin...
## $ sponsor     <chr> "", "", "Economist", "", "Huffington Post", "Economist"...
## $ sample_size <int> 2202, 2202, 1500, 2200, 1000, 1500, 1074, 1207, 1207, 1...
## $ population  <chr> "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", ...
## $ party       <chr> "all", "all", "all", "all", "all", "all", "all", "all",...
## $ subject     <chr> "concern-economy", "concern-economy", "concern-infected...
## $ tracking    <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,...
## $ text        <chr> "How concerned are you that the coronavirus will impact...
## $ very        <dbl> 19, 26, 13, 23, 11, 11, 22, 22, 22, 10, 10, 32, 32, 41,...
## $ somewhat    <dbl> 33, 32, 26, 32, 24, 28, 23, 35, 21, 28, 30, 37, 39, 37,...
## $ not_very    <dbl> 23, 25, 43, 24, 33, 39, 37, 28, 33, 42, 40, 18, 17, 14,...
## $ not_at_all  <dbl> 11, 7, 18, 9, 20, 22, 19, 15, 23, 19, 20, 6, 5, 5, 16, ...
## $ url         <chr> "https://morningconsult.com/wp-content/uploads/2020/02/...

There are two type of polls, concern related to economy and concern related to health risk and safety. We will use the later one.

concern_infection <- covid_concern %>% 
  filter(subject == "concern-infected")

We will do similar step to prepare the long format version of the data. For the date we will use the end_date of each survey. If the sponsor the survey is missing/NA, we will italic the data. The <i> symbol indicates that we will italic the text.

df_infect <- concern_infection %>% 
  mutate_at(vars(contains("date")), ymd) %>% 
  select(end_date, very, somewhat, not_very, not_at_all, pollster, sponsor) %>% 
  pivot_longer(-c(end_date, pollster, sponsor)) %>% 
  mutate(name = name %>% 
           str_remove_all("_estimate") %>% 
           str_replace_all("_", " ") %>% 
           str_to_title(),
         sponsor = ifelse(sponsor == "", "<i>NA</i>", sponsor)
         )

df_infect <- df_infect %>% 
  filter(end_date >= min(df_concern$modeldate)) %>% 
  mutate(timestamp = datetime_to_timestamp(end_date))

df_infect
## # A tibble: 1,612 x 6
##    end_date   pollster                 sponsor   name       value     timestamp
##    <date>     <chr>                    <chr>     <chr>      <dbl>         <dbl>
##  1 2020-02-16 AP-NORC                  <i>NA</i> Very          22 1581811200000
##  2 2020-02-16 AP-NORC                  <i>NA</i> Somewhat      23 1581811200000
##  3 2020-02-16 AP-NORC                  <i>NA</i> Not Very      37 1581811200000
##  4 2020-02-16 AP-NORC                  <i>NA</i> Not At All    19 1581811200000
##  5 2020-02-18 Kaiser Family Foundation <i>NA</i> Very          22 1581984000000
##  6 2020-02-18 Kaiser Family Foundation <i>NA</i> Somewhat      21 1581984000000
##  7 2020-02-18 Kaiser Family Foundation <i>NA</i> Not Very      33 1581984000000
##  8 2020-02-18 Kaiser Family Foundation <i>NA</i> Not At All    23 1581984000000
##  9 2020-02-18 YouGov                   Economist Very          10 1581984000000
## 10 2020-02-18 YouGov                   Economist Somewhat      28 1581984000000
## # ... with 1,602 more rows

Event

As the vertical line that notify important event, we will manually create the data.frame with information gained from the original plot.

tribble(~event, ~date,
        "First U.S. death reported", "2020-02-29",
        "U.S. deaths surpass 10,000", "2020-04-06",
        "U.S. deaths surpass 100,000", "2020-05-28",
        "Trump diagnosed with COVID-19", "2020-10-2"
        )
## # A tibble: 4 x 2
##   event                         date      
##   <chr>                         <chr>     
## 1 First U.S. death reported     2020-02-29
## 2 U.S. deaths surpass 10,000    2020-04-06
## 3 U.S. deaths surpass 100,000   2020-05-28
## 4 Trump diagnosed with COVID-19 2020-10-2

Here, I directly transform the event data into HTML format to suite the text shown in the original plot. The date are also transformed into timestamp so they will properly positioned in the highcharter x-axis.

  • <i>: italic
  • <br>: line break
event_date <- tribble(~event, ~date,
                      "<i>First U.S.<br><i>death<br><i>reported</i>", "2020-02-29",
                      "<i>U.S. deaths<br><i>surpass<br><i>10,000</i>", "2020-04-06",
                      "<i>U.S. deaths<br><i>surpass<br><i>100,000</i>", "2020-05-28",
                      "<i>Trump<br><i>diagnosed<br><i>with<br><i>COVID-19</i>", "2020-10-2"
                      ) %>% 
  mutate(timestamp = ymd(date) %>% datetime_to_timestamp())

event_date
## # A tibble: 4 x 3
##   event                                                  date          timestamp
##   <chr>                                                  <chr>             <dbl>
## 1 <i>First U.S.<br><i>death<br><i>reported</i>           2020-02-29      1.58e12
## 2 <i>U.S. deaths<br><i>surpass<br><i>10,000</i>          2020-04-06      1.59e12
## 3 <i>U.S. deaths<br><i>surpass<br><i>100,000</i>         2020-05-28      1.59e12
## 4 <i>Trump<br><i>diagnosed<br><i>with<br><i>COVID-19</i> 2020-10-2       1.60e12

Prepare Color

Here we will prepare the color of each responses category. For the scatter plot, we will use transparent color by transforming the color using hex_to_rgba() and set the transparency into 0.2. You can use the colorpick eyedropper extension from Google Chrome if you want to get the color yourself.

The color_group will be used to indicate the color both for the line chart and the scatter plot.

df_color <- data.frame(name = unique(df_concern$name),
                       color = c("#F56B38",  "#F9AA87", "#DAAFD6", "#BF7CCC")
                       ) %>% 
  mutate(color_opaque = hex_to_rgba(color, 0.2))

df_concern <- df_concern %>% 
  left_join(df_color) 

df_infect <- df_infect %>% 
  left_join(df_color) 

# Final Color Scheme
scatter_color <- df_color$color_opaque[4:1]
line_color <- df_color$color[4:1]
color_group <- c(line_color, scatter_color)

Tooltip

Finally, we create the tooltip for each data. For the line chart, we set the font size to 14 pixels using font type of Roboto Slab (you can see all available font from Google Font API). For the scatter plot, we will show the responses and the percentage, the pollster and the sponsor. This part is different from the original plot that completely remove all tooltip from the scatter plot. All fonts from the Five Thirty Eight are commisioned fonts we can’t get them directly.

df_concern <- df_concern %>% 
   mutate(
    tooltip = paste0("<span style='font-size: 14px; font-family: Roboto Slab; color: ",color,"'>", name, ": ", number(value, accuracy = 0.01, suffix = "%"), "</span>")
    )

df_infect <- df_infect %>% 
   mutate(
    tooltip = paste0("<span style='color: ",color,"'>", name, ": ", number(value, accuracy = 0.01, suffix = "%"), "</span><br>",
                     "Pollster: ", pollster, "<br>", "Sponsor: ", sponsor)
    )

head(df_concern, 10)
## # A tibble: 10 x 7
##    modeldate  name   value tooltip                timestamp color  color_opaque 
##    <date>     <chr>  <dbl> <chr>                      <dbl> <chr>  <chr>        
##  1 2020-02-15 Very    11.8 <span style='font-si~    1.58e12 #F56B~ rgba(245,107~
##  2 2020-02-15 Somew~  26.3 <span style='font-si~    1.58e12 #F9AA~ rgba(249,170~
##  3 2020-02-15 Not V~  39.1 <span style='font-si~    1.58e12 #DAAF~ rgba(218,175~
##  4 2020-02-15 Not A~  20.0 <span style='font-si~    1.58e12 #BF7C~ rgba(191,124~
##  5 2020-02-16 Very    16.4 <span style='font-si~    1.58e12 #F56B~ rgba(245,107~
##  6 2020-02-16 Somew~  24.8 <span style='font-si~    1.58e12 #F9AA~ rgba(249,170~
##  7 2020-02-16 Not V~  37.8 <span style='font-si~    1.58e12 #DAAF~ rgba(218,175~
##  8 2020-02-16 Not A~  19.7 <span style='font-si~    1.58e12 #BF7C~ rgba(191,124~
##  9 2020-02-17 Very    16.4 <span style='font-si~    1.58e12 #F56B~ rgba(245,107~
## 10 2020-02-17 Somew~  24.8 <span style='font-si~    1.58e12 #F9AA~ rgba(249,170~

Visualization

Finally, we create the interactive chart using highcharter. First, we will create the basic plot, including the scatter plot and the line chart. To remove the legend form the chart, we use the hc_legend() and set the color using the hc_color. The hcaes is where you will include the information regarding the x-Axis, y-Axis, and the color grouping based on the responses category.

hchart(df_concern, "line", hcaes(x = modeldate, y = value, group = name), lineWidth = 4)  %>% 
  hc_add_series(data = df_infect, 
                type = "scatter", hcaes(x = end_date, y = value, group = name)) %>% 
  hc_colors(colors = color_group) %>% # Adjust Color
  hc_legend(enabled = F) # Remove Legend

Next, we add the setting for the chart title, subtitle, x-Axis and y-Axis. During the creation of the x-axis, we also include the vertical line that correspond to important events related to COVID-19 using the plotLines.

hchart(df_concern, "line", hcaes(x = modeldate, y = value, group = name), lineWidth = 4)  %>% 
  hc_add_series(data = df_infect, 
                type = "scatter", hcaes(x = end_date, y = value, group = name)) %>% 
  hc_colors(colors = color_group) %>% 
  hc_legend(enabled = F) %>% 
  hc_title(text = "How worried are Americans about infection?", 
           style = list(fontWeight = "bold", fontSize = "19px")
           ) %>% 
  hc_subtitle(text = "How concerned Americans say they are that they, someone in their family or someone else they know will<br>become infected with the coronavirus",
              style = list(fontSize = "12px", color = "black", fontFamily = "Roboto Slab")) %>% 
  hc_yAxis(title = list(text = ""),
           labels = list(style = list(fontFamily = "Roboto Slab", fontWeight = "bold", color = "#A2A2BF"),
                         formatter = JS("function(){return(this.value + '%')}")), # Create % format for y-axis
           tickInterval= 25 # Interval of x-Axis (0, 25, 50, 75)
           ) %>% 
  hc_xAxis(title = list(text = ""),
           labels = list(style = list(fontFamily = "Roboto Slab", color = "#A2A2BF", fontWeight = "bold")),
           dateTimeLabelFormats = list(month = "%m/%d", week = "%m/%d"), # Format of the date label
           
           plotLines = list(
             list(value = event_date$date[1], color = "gray", 
                  label = list(text = event_date$event[1], rotation = 0, align="center"),
                  zIndex = 1000, dashStyle = "longdash"),
             list(value = event_date$date[2], color = "gray", 
                  label = list(text = event_date$event[2], rotation = 0, align="center"),
                  zIndex = 1000, dashStyle = "longdash"),
             list(value = event_date$date[3], color = "gray", 
                  label = list(text = event_date$event[3], rotation = 0, align="center"),
                  zIndex = 1000, dashStyle = "longdash"),
             list(value = event_date$date[4], color = "gray", 
                  label = list(text = event_date$event[4], rotation = 0, align="center"),
                  zIndex = 1000, dashStyle = "longdash")
           )
           
           ) 

Finally, we add the setting for the tooltip. The tooltip is the most challenging part since the highcharter cannot fully replicate the tooltip from the original plot.

hchart(df_concern, "line", hcaes(x = modeldate, y = value, group = name), lineWidth = 4)  %>% 
  hc_add_series(data = df_infect, 
                type = "scatter", hcaes(x = end_date, y = value, group = name)) %>% 
  hc_colors(colors = color_group) %>% 
  hc_legend(enabled = F) %>% 
  hc_title(text = "How worried are Americans about infection?", 
           style = list(fontWeight = "bold", fontSize = "19px")
           ) %>% 
  hc_subtitle(text = "How concerned Americans say they are that they, someone in their family or someone else they know will<br>become infected with the coronavirus",
              style = list(fontSize = "12px", color = "black", fontFamily = "Roboto Slab")) %>% 
  hc_yAxis(title = list(text = ""),
           labels = list(style = list(fontFamily = "Roboto Slab", fontWeight = "bold", color = "#A2A2BF"),
                         formatter = JS("function(){return(this.value + '%')}")), 
           tickInterval= 25 ) %>% 
  hc_xAxis(title = list(text = ""),
           labels = list(style = list(fontFamily = "Roboto Slab", color = "#A2A2BF", fontWeight = "bold")),
           dateTimeLabelFormats = list(month = "%m/%d", week = "%m/%d"),
           
           plotLines = list(
             list(value = event_date$date[1], color = "gray", 
                  label = list(text = event_date$event[1], rotation = 0, align="center"),
                  zIndex = 1000, dashStyle = "longdash"),
             list(value = event_date$date[2], color = "gray", 
                  label = list(text = event_date$event[2], rotation = 0, align="center"),
                  zIndex = 1000, dashStyle = "longdash"),
             list(value = event_date$date[3], color = "gray", 
                  label = list(text = event_date$event[3], rotation = 0, align="center"),
                  zIndex = 1000, dashStyle = "longdash"),
             list(value = event_date$date[4], color = "gray", 
                  label = list(text = event_date$event[4], rotation = 0, align="center"),
                  zIndex = 1000, dashStyle = "longdash")
           )
           
           ) %>% 
  hc_tooltip(crosshairs = TRUE,
             backgroundColor = "white",
             fillOpacity = 0.5,
             shared = TRUE, 
             borderWidth = 0,
             useHTML = T,
             headerFormat = "<b><span style='font-size:16px; font-family: Roboto Slab; color: gray;'>{point.x: %b %d, %Y}</span></b><br>",
             pointFormat = "<b>{point.tooltip}</b><br>"
             )

Scroll to Top