Visualizing Arkansas traffic fatalities

I recently started a master’s program at UALR in information science, so I’ve been following several blogs on statistical programming and visualization. One of the best sites I’ve found is R-bloggers, which is dedicated to the popular statistical programming language R.

A recent post on R-bloggers by Lucas Puente on mapping traffic fatalities in the US caught my eye. While in private practice, I helped several Arkansas counties through the election process of voting from dry to wet. One of the often-debated issues in those races was whether the highways are safer in dry counties or wet counties. Some people believe roads are safer when alcohol is less available; other people believe that less availability means more stockpiling and more driving drunk over long distances to liquor stores after knocking a few back.

So, I decided to adapt Mr. Puente’s R program to create an Arkansas-centric map. In the R-bloggers tradition, I’ll explain the code and then present the results.

Part I: The Data

The data comes from two separate sources. First, the traffic fatality data comes from National Highway and Transportation Safety Administration open records, available at ftp://ftp.nhtsa.dot.gov/fars/2015/National/. As instructed by Mr. Puente, I downloaded the file FARS2015NationalDBF.zip and unzipped it. 

Next, I had to get the wet/dry status of each Arkansas county. To do this, I created my own CSV (comma-separated value) file from the Arkansas Department of Finance and Administration’s wet/dry status page. I uploaded it to a Github public repository so others can use it.

library(foreign) # required to use read.dbf method
accidents <- read.dbf("accident.dbf")
wet_status <- read.csv(file="Ark_counties_wet_dry_status.csv")

Part II: Subsetting and Summarizing the Data

As Mr. Puente did, we’re only going to use a portion of the NHTSA’s information. However, instead of using the lower 48 states, we’re just going to use data for Arkansas. We’ll then sum the number of fatalities for all wrecks by county.

ark_accidents <- subset(accidents, STATE == 5)
ark_summary_all <- aggregate(FATALS ~ COUNTY, ark_accidents, sum) 

 Next, we’ll create a vector from the wreck subset to identify just the drunk driving wrecks. We’ll use that vector to sum drunk driving fatalities per county, then calculate a percentage of wrecks involving drunk driving fatalities for each county.

ark_accidents_drunk <- ark_accidents$DRUNK_DR > 0
ark_summary_drunk <- aggregate(FATALS ~ COUNTY, ark_accidents, sum, subset=ark_accidents_drunk)
rk_summary <- merge(ark_summary_all, ark_summary_drunk, by="COUNTY", all=TRUE)
ark_summary$percent_drunk <- ark_summary$FATALS.y / ark_summary$FATALS.x * 100

We’ll merge that data with the wet/dry status of each county. So we can color the scale differently for wet counties and dry counties, we’ll multiply the percentage by -1 for wet counties.

ark_summary <- merge(ark_summary, wet_status, by.x="COUNTY", by.y="FIPS")
ark_summary$percent_drunk <- ifelse(ark_summary$wet, ark_summary$percent_drunk * -1, ark_summary$percent_drunk)

Finally, we’ll get the midpoint of the number of drunk drivers, which we’ll use to color parts of our map later.

mid <- (which.max(accidents$DRUNK_DR) - which.min(accidents$DRUNK_DR)) / 2

Part III: Preparing the Map Data

The functions Mr. Puente describes in his tutorial have nice features that allow you to subset county and state map data by state.

county_map_data <- map_data("county", region = "arkansas")
state_map <- map_data("state", region = "arkansas")

I merged the county-level information with the wet-dry status. Then (and this is important), I reordered the path information in the county file. Otherwise the plotting function draws extra lines between counties.

county_map_data <- merge(county_map_data, ark_summary, by.x="subregion", by.y="county")
library(plyr) # necessary for arrange function
county_map_data <- arrange(county_map_data, order) # required to draw lines properly

Part IV: Creating the Map

The goal with the visualization was to show the wet/dry status of counties having drunk driving fatalities. So, the percent_drunk column we created earlier, which contains percentages from -100 to 100, serves as the fill. We’ll define a continuous scale in different colors to differentiate between wrecks occurring in wet counties and dry counties; by choosing white as the midpoint, we’ll be able to see which counties had no drunk driving wrecks in 2015, and it will also give us a gradient that shows some idea of how many wrecks occurred.

map<-ggplot() + 
# Add county borders:
geom_polygon(data=county_map_data, aes(x=long,y=lat,group=group, fill=percent_drunk), colour = alpha("grey", 1/4), size = 0.8) +
scale_fill_gradient2(name="Percentage of\nDrunk Drivers", midpoint=0, low="#5ab4ac", mid="white", high="#d8b365", na.value = "white", breaks=c(-100,0,100), labels=c("100% (wet county)", "No drunk driving\nfatalities", "100% (dry county)")) +
# Add state borders:
geom_polygon(data = state_map, aes(x=long,y=lat,group=group), colour = "grey", fill = NA) +

The next goal was to represent the number of fatalities in each wreck by the size of the point. The NHTSA dataset also contains an interesting data point for the number of drunk drivers involved, which we’ll use for the color of the point. It appears that one wreck in Pulaski County involved 3 drunk drivers and killed several people.

# Add points (one per fatality):
geom_point(data=ark_accidents, aes(x=LONGITUD, y=LATITUDE, color=DRUNK_DR, size=FATALS), alpha=0.35) +
scale_color_gradient2(name="Number of\nDrunk Drivers\nInvolved", midpoint=mid, low="lightgoldenrod4", mid="firebrick1", high="blue3", na.value="yellow") +
scale_size(name="Number of \nFatalities", range=c(3,8)) +

Finally, we’ll use Mr. Puente’s other adjustments for cleaning up the map, and then plot it.

#Adjust the map projection
coord_map("albers",lat0=39, lat1=45) +

#Add a title:
ggtitle("Arkansas Traffic Fatalities in 2015") +

#Adjust the theme:
theme_classic() +
theme(panel.border = element_blank(),
axis.text = element_blank(),
line = element_blank(),
axis.title = element_blank(),
plot.title = element_text(size=40, face="bold"))
map

The result is a detailed graphic that shows in a glance that 21 of 75 counties had no drunk driving fatalities in 2015. If you know anything about Arkansas highways, you can definitely see the outlines of US-70, I-30, I-40 west of Little Rock, and I-49 from Ft. Smith to Fayetteville. 

2015 Arkansas traffic fatalities. Graphic ©2016 Nathan Chaney.
2015 Arkansas traffic fatalities. Graphic ©2016 Nathan Chaney.

What do you take away from this visualization?

Which Arkansas counties are the most litigious?

I’ve posted before about the upcoming launch of Docket Dog, a case watching service for Arkansas state court cases. To me, one of the interesting things coming out of Docket Dog is the ability to look at different metrics for case filings in Arkansas.

I have been learning the Python programming language, which has some good visualization tools available for it. My latest exploration has been using Python to create state maps. I found this awesome tutorial by a guy with an equally-awesome name (Nathan, of course!). I also wanted to view metrics per capita, so I downloaded the latest US Census data.

I wanted to figure out which Arkansas counties were the most sue-happy. So, looked at the total number of cases filed in each county, divided by the population, and plotted the result. Each color band is a multiple of the average for that year.

2015 cases per capita. Click to enlarge.
2015 cases per capita. Click to enlarge.
2014 cases per capita. Click to enlarge.
2014 cases per capita. Click to enlarge.
2013 cases per capita. Click to enlarge.
2013 cases per capita. Click to enlarge.
2012 cases per capita. Click to enlarge.
2012 cases per capita. Click to enlarge.
2011 cases per capita. Click to enlarge.
2011 cases per capita. Click to enlarge.
2010 cases per capita. Click to enlarge.
2010 cases per capita. Click to enlarge.

What do you make of this? Clark County, my old stomping ground, is average from 2010 to 2012, but is well above that the last couple of years.

Why do you think certain counties are more litigious than others?

How long will my case take?

Nathan here. I’m back for a guest post with some new tricks I’ve learned at my new job from some of the researchers at UAMS. I’ve having a blast getting an inside look at cutting-edge biomedical research. This post looks at some data visualization about the time it takes to resolve civil tort cases in Arkansas.

Background:

One of the researchers has a master’s degree in computer science, and I picked his brain a little bit about what software packages he likes to use. He prefers python to Perl (which I like) because python’s research libraries are easier to use.

I took his recommendations to heart, and I’ve been tinkering around with the Anaconda python distribution with data I’ve gathered for another project I’m working on releasing very soon: Docket Dog. It’s an Arkansas state court notification system. I used the data mining application Orange to perform some data visualization on the types of civil cases my dad and brother handle.

Arkansas Tort Case Length Analysis:

I took a look at over 98000 tort cases available electronically from the Administrative Office of the Courts for which I could calculate an end date. This is what the time frames look like:

Pendency of Arkansas tort cases in years. The scale is 20 years wide. Click to enlarge.
Pendency of Arkansas tort cases in years. The scale is 20 years wide. Click to enlarge.

As you can see, civil court cases can take several years to resolve. We’ll see what the averages look like here in a few minutes with another chart.

In the meantime, there are several interesting patterns that appear in this chart. For instance, on the first line for product liability cases, there are several vertical bands around 9, 12, and 14–16 years. I haven’t looked into this, but I suspect each band probably represents a settlement of a specific type of cases, like Firestone exploding tire cases, Pinto exploding car cases, or something similar.

The declaratory judgment (dec action) line is notably shorter overall than the others. Again, I haven’t researched this further, but I would expect this is due to the fact that dec actions don’t involve juries and are usually about a specific question of law. For instance, lots of dec actions involve whether there is insurance coverage for a particular event or not (the hilarious Luther Sutter v. Dennis Milligan dec action notwithstanding). 

Now, on to the next chart. This is called a box chart:

Comparison of median Arkansas tort case values over the last 20 years. Click to enlarge.
Comparison of median Arkansas tort case values over the last 20 years. Click to enlarge.

This chart is broken up into quartiles. The light blue box represents 50% of all cases. So, 50% of motor vehicle collision (MVC) cases are decided within 2 years, with the median value being 1.6 years. (Median means the middle value; if there were 101 cases, for instance, the median value would be the 51st value). The average MVC case length is shorter at just over 1 year.

The dark blue lines represent maximum values, excluding outliers. The dots out to the right of the graph represent those outliers, which extend out to 20 years.

What’s the bottom line? For 3/4 of tort cases, you can expect resolution to take at least 6 months to 3 years. Another quarter of cases take up to 4 years or so. And, there are always outliers that can take many, many years to reach ultimate resolution.

What questions do you have about this analysis?