The COVID-19 pandemic
COVID-19 is a global pandemic that has led to severe global socioeconomic disruption, and subsequently the largest global recession in history. Countries were put on full lockdown, curfew, and nationwide quarantine. Some countries has declared the state of emergency.
More than a third of the planet’s population is under some form of restriction.
Business Insider
Governments around the world temporarily closed educational institutions in their attempt to contain the spread, along with other social distancing measures.
These nationwide closures are impacting over 91% of the world’s student population.
UNESCO
While public health, commercial and clinical laboratories work around the clock to test for new cases of COVID-19, analysts, policymakers and the media rely on these data (lab-confirmed infections) to report the number and monitor the growth of “confirmed cases”.
These data are important, and as one researcher put it, it is “our window onto the pandemic and how it is spreading”. Without data, we have no way of understanding the spread of the pandemic and consequently, no way to responding to this threat appropriately.
Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE)
The Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE), with support by ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab (JHU APL), started to gather the data from a list of data sources released by the WHO, independent organizations as well as government-released statistics that include:
- World Health Organization (WHO): https://www.who.int/
- DXY.cn. Pneumonia. 2020. http://3g.dxy.cn/newh5/view/pneumonia.
- BNO News: https://bnonews.com/index.php/2020/02/the-latest-coronavirus-cases/
- National Health Commission of the People’s Republic of China (NHC):
http://www.nhc.gov.cn/xcs/yqtb/list_gzbd.shtml - China CDC (CCDC): http://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm
- Hong Kong Department of Health: https://www.chp.gov.hk/en/features/102465.html
- Macau Government: https://www.ssm.gov.mo/portal/
- Taiwan CDC: https://sites.google.com/cdc.gov.tw/2019ncov/taiwan?authuser=0
- US CDC: https://www.cdc.gov/coronavirus/2019-ncov/index.html
- Government of Canada: https://www.canada.ca/en/public-health/services/diseases/coronavirus.html
- Australia Government Department of Health: https://www.health.gov.au/news/coronavirus-update-at-a-glance
- European Centre for Disease Prevention and Control (ECDC): https://www.ecdc.europa.eu/en/geographical-distribution-2019-ncov-cases
- Ministry of Health Singapore (MOH): https://www.moh.gov.sg/covid-19
- Italy Ministry of Health: http://www.salute.gov.it/nuovocoronavirus
- 1Point3Arces: https://coronavirus.1point3acres.com/en
- WorldoMeters: https://www.worldometers.info/coronavirus/
- COVID Tracking Project: https://covidtracking.com/data. (US Testing and Hospitalization Data. We use the maximum reported value from “Currently” and “Cumulative” Hospitalized for our hospitalization number report ed for each state.)
- French Government: https://dashboard.covid19.data.gouv.fr/
The data is collected using an automated script, and pushed to a GitHub (learn more about GitHub) repository that they maintain.
The data source to our COVID-19 Dashboard
For the first half of this course, we will use a “cleaned” version of the dataset, extracted from JHU CSSE. This is, of course, provided to you as a convenient CSV in the Materials tab of the course. We will learn about various data visualization and plotting techniques in R using this dataset. The advantage of working with this CSV:
- This simplifies the learning by allowing us to focus on what really matters: data visualization and the
ggplot
visualization library. The CSV I’ve provided to you has been wrangled into the right shape we need for our exercise. - We don’t have to incur internet charges trying to read from JHU CSSE (“remote source”) every time our code performs a “read” operation
- It’s faster, and because it’s local, you can store it anywhere you want and make quick experiments without being connected to the internet
As we develop our COVID-19 web dashboard, we will replace this with the direct call to JHU CSSE’s repository (“remote source”). This means some extra “overhead” since our R script needs to include the preprocessing steps. However, it has the benefit of:
- Being a true “real-time” dashboard, showing our app visitors the latest figure on confirmed cases, deaths, and recovery
- Not worry about storage. Since our dashboard makes the call to our remote source in real-time, and as we request for it, we don’t have to set up additional facilities to host our CSV (which can actually grow to quite huge!) or provision a database
Course Syllabus
High level overview: developing a web analytics app in R using Shiny
The Outcome
Our dashboard will be web-based, so it is accessible to anyone with an internet connection. We want this dashboard to be responsive so it scales up to wide screen monitors but also “rearranges” its elements to fit nicely on a mobile phone.
Usually, this requires a developer to learn HTML, CSS, JavaScript along with a server-side language like Python and R. However, using the Shiny web app framework, we will only write our code in R; Our code in R will be “translated” to the HTML + CSS + JavaScript required for all the front-end action through the Shiny framework.
But what use is a web dashboard if we do not have a way to communicate our message? Data visualization, at its very essence, is about communication. So that’s where we’ll start this course with.
If you’re ready, head over to the course page, download the Cleaned CSV dataset and we’ll get started!