Animating U.S. COVID-19 hotspots over time

This is an extension of my previous post on visualizing COVID-19 in Arkansas, and the code used is similar. As such, I won’t bury the lede again by walking through the code first. Here’s the animation:

Again, I’m using the plasma color palette from the viridis R package to show hot and cold spots intuitively, and again the color scale for the number of cases is shown on log scale. One nice thing about this color scale (at least as of the time of writing) is that the changes in color correspond pretty nicely to each order of magnitude on the log scale. As with before, this image is set to update daily, so this post should be current throughout the coronavirus pandemic.

One design choice in this animation is different than in the Arkansas visualization. As discussed in the previous post, I elected to use the median county size (rounded to the nearest 5,000) for the per capita calculations. A commenter mentioned that powers of 10 are customary in public health reporting. While I completely agree that’s customary, I chose the median value of 20,000 to use for per capita calculations as providing a better intuitive feel for the actual number of cases in most counties in the state without having to do a lot of mental math. There’s more explanation in the comments on that post.

For the entire U.S., the median county population is 25,000 (when rounded to the nearest 5,000). However, the mean county population in the U.S. is about 102,000, which is very close to a power of 10 that would customarily be used for public health reporting. As such, I would have a harder time justifying a design choice different than what’s customary. So, this animation uses the customary 100,000 figure for per capita calculations.

What do you think about this animation?

UPDATE 2020-10-10: You’ll need an API key from the U.S. census to run the code; tidycensus has a function to install the API key into your ~/.Renviron file. I added a couple of commented lines to demonstrate how to use this. It’s not necessary to include this in the code after the API key is installed, which is why I inadvertently left it out of the post. I should also note the animation takes about 3 hours to render on a laptop that’s a few years old, as gganimate doesn’t currently have parallelization for rendering gifs.

The code for the post follows:

library(tidyverse)
library(lubridate)
library(plotly)
library(gganimate)
library(tidycensus)
library(transformr)
library(ggthemes)
library(viridis)

options( scipen = 10 ) # print full numbers, not scientific notation

covid_cases <- read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_US.csv")
covid_cases <- pivot_longer(covid_cases, 12:length(covid_cases), names_to = "date", values_to = "cases") %>%
  mutate(date = lubridate::as_date(date, format = "%m/%d/%y", tz = "UTC"))

# census_api_key(YOUR_API_KEY_HERE, install = TRUE) # installs key to ~/.Renviron for future use
# readRenviron("~/.Renviron") # Only necessary after API key is first installed
population <- tidycensus::get_estimates(geography = "county", "population") %>% 
  mutate(GEOID = as.integer(GEOID)) %>%
  pivot_wider(
    names_from = variable,
    values_from = value
  )

# Per capita calculation is to nearest 5k of median county population
per_capita <- population %>% 
  summarize(avg = mean(POP)) %>% 
  unlist() %>%
  plyr::round_any(., 1e4)

roll_us_cases <- covid_cases %>% 
  filter(`Country_Region` == "US" | `Country_Region` == "United States") %>%
  filter(Province_State != "Puerto Rico") %>%
  filter(FIPS < 80000) %>%
  # filter(Province_State != "Alaska" & Province_State != "Hawaii") %>%
  filter(Admin2 != "Unassigned") %>%
  arrange(date) %>%
  group_by(UID) %>%
  mutate(prev_count = lag(cases)) %>%
  mutate(prev_count = ifelse(is.na(prev_count), 0, prev_count)) %>%
  mutate(new_cases = cases - prev_count) %>%
  mutate(roll_cases = round(zoo::rollapply(new_cases, 7, mean, fill = 0, align = "right", na.rm = T)))%>%
  ungroup() %>%
  select(-prev_count) %>%
  left_join(
    population %>% select(-NAME),
    by = c("FIPS" = "GEOID")
  ) %>%
  mutate(
    cases_capita = round(cases / POP * per_capita), # cases per 100,000 residents
    new_capita = round(new_cases / POP * per_capita), # cases per 100,000 residents
    roll_capita = round(roll_cases / POP * per_capita) # rolling new cases per 100,000 residents
  )

# tidycensus version
# Includes Alaska and Hawaii as rescaled and shifted
data("county_laea")
data("state_laea")

first_date <- min({ roll_us_cases %>%
    group_by(date) %>%
    summarize(roll_cases = sum(roll_cases)) %>%
    ungroup() %>%
    filter(roll_cases > 0) %>%
    select(date)
}$date)

temp <- roll_us_cases %>%
  filter(date >= first_date) %>%
  mutate(roll_capita = ifelse(roll_capita <= 0, 1, roll_capita)) %>% # log10 scale plot
  mutate(roll_cases = ifelse(roll_cases <= 0, 1, roll_cases)) # log10 scale plot

temp_sf <- county_laea %>%
  mutate(GEOID = as.numeric(GEOID)) %>%
  mutate(GEOID = ifelse(GEOID == 46113, 46102, GEOID)) %>% # SD Oglala Lakota County name change
  mutate(GEOID = ifelse(GEOID == 2270, 2158, GEOID)) %>% # AK Kusilvak census area
  inner_join(temp, by = c("GEOID" = "FIPS"))

days <- NROW(unique(temp$date))

p <- ggplot() +
  geom_sf(data = temp_sf, aes(fill = roll_capita), size = 0) +
  geom_sf(data = state_laea, fill = "transparent", color = alpha("gray70", 0.25), size = 0.75) +
  scale_fill_viridis(
    name = "7-day rolling cases: ",
    trans = "log10",
    option = "plasma",
  ) +
  ggthemes::theme_map() +
  theme(legend.position = c(0.5, 0.01), legend.direction = "horizontal") +
  labs(
    title = paste0("US 7-day rolling average of new COVID cases per ", scales::comma(per_capita), " residents"),
    subtitle = "Date: {frame_time}"
  ) +
  transition_time(date)

anim <- animate(
  p, 
  nframes = days + 10 + 30, 
  fps = 5, 
  start_pause = 10, 
  end_pause = 30,
  res = 96,
  width = 800,
  height = 600,
  units = "px"
)

# Be sure to insert your own save location here -- a missing directory will cause errors
anim_save("images/us_covid_rolling_cases_plasma.gif", animation = anim)
anim

8 Comments

  1. Charles Knell

    After installing the packages, I ran the code and got this error message:

    “Error in tidycensus::get_estimates(geography = “county”, “population”) :
    A Census API key is required. Obtain one at http://api.census.gov/data/key_signup.html, and then supply the key to the `census_api_key` function to use it throughout your tidycensus session.”

    I don’t see a call to this function in your code, so I don’t know how to use the api key I got from the census web site.
    Please advise, thanks.

    • Thanks for the question! If you’ve downloaded an API key from the census website, you install it like this:

      census_api_key("111111abc", install = TRUE)
      # First time, reload your environment so you can use the key without restarting R.
      readRenviron("~/.Renviron")

      After that, you don’t have to have the function call in the code, as the key is read from the environment variable. That’s why I accidentally left this out of the post. I updated the post to show this. Thanks for asking!

      • Charles Knell

        It really does take a long time to run! When I got to the end, I got this warning:
        “Warning message:
        file_renderer failed to copy frames to the destination directory ”

        I followed the path deep into the /var directory and found gganim_plot001.png through gganim_plot225.png. How may file should there be?

        Finally, when I ran the anim_save() function I go this error message:
        “Error: The animation object does not specify a save_animation method”

        Reading the help file for this function, I see this:
        “…, the returned object must implement a save_animation method to be able to be used with anim_save(). This is provided natively for gif_image and magick-image objects.”

        I looked through your code and couldn’t find any calls creating one of these.
        What’s missing?
        Thanks.

        • I’m not entirely sure. The code is set up in an R package, and the save location in my code is a subdirectory named images. The error makes it sound like that directory doesn’t exist (the “destination directory” error). I’d address the save location first.

          The anim object is the right kind of object for anim_save(). ~225 sounds like the right number of images that need to be combined. If it’s not a simple missing directory error, I wonder if there’s something up with your magick installation? That’s the underlying software gganimate uses to create gifs. I already had it installed when I began this project, so I don’t have a specific recollection of any issues with it. Maybe you need to add the output of ‘which magick’ to your PATH?

  2. Ed Barker

    Suffers from a conflation of scale as most map representations of covid-19 do. Better would be to establish a unit area, New York uses zip code areas, and then do the mapping. It is not useful to use county areas because of the enormous differences in population densities within and among counties. A better approach would be to map Standard Metropolitan Statistical Areas, SMSA’s, use the 20 most populous cities, those cities are major transportation, i.e. dispersal, locations due to the presence of international airports, seaports. Eliminate the counties containing the SMSA’s and then map only the counties along the major highways, or railways. See what that animation shows.

  3. Bob

    Hello, thanks for publishing this. Looking at your code, I don’t understand the calculation of `per_capita`. You haven’t grouped, so `summarize` will just return a single number. And in fact that calculation just returns the single number 100,000 (as it should since there are about 3300 counties and a population of 330,000,000).

    • Right, summarize uses the whole dataset if you don’t group. 100k is the per capita number I’m using (the per capita number in the plot title is dynamically generated). I did this the tidy way (instead of using the mean function) because during EDA I looked at using the median instead, as I did with the state version.

  4. Pingback: Understanding COVID19 in Connecticut. It takes a town | R-bloggers

Leave a Reply

Your email address will not be published. Required fields are marked *