An abbreviated version of this article is published on the Berkeley Economic Review as Partisanship and COVID-19 Response.
In November of last year, I was assigned a cool final project in statistics class: take a COVID-19 dataset and dig around for anything interesting. The dataset included state-level COVID-19 cases, deaths, and a handful of related variables (like ICU availability, lockdown status, age). I snuck in an extra variable: the results of the 2020 election.
I was inspired to look at partisanship by previous studies that I’d read. One of them found that states with Democratic governments had better pandemic responses than those with Republican governments, even after controlling for other variables. A later study confirmed that while Republican-led states fared better in the beginning, Democrat-led states have had lower case rates since June 8th.
Part of the explanation lies in individual behavior. For example, Fox News consumption turns out to be correlated with less physical distancing. More strikingly, researchers have shown that partisanship is the strongest predictor of mobility—how likely someone is to travel. These differences in distancing have, in turn, been associated in increased fatality rates.
None of these studies, however, took advantage of the 2020 election results. These are interesting data points. Time reported that—unsettlingly—the counties with the most COVID cases were also the biggest fans of President Trump. Election results offer more information than a binary red/blue label on state governments, measuring the magnitude of political support, while also distinguishing Republicanism from Trumpism (otherwise, we’d have put my home state of Maryland in the same basket as previous home, North Dakota).
I set out to examine the relationship between election results and COVID outcomes.
For the class project, I chose to simply correlate and plot partisanship against COVID cases and deaths. The dataset only included COVID outcomes from June and October, so I used those numbers. I pulled election results from here and subtracted Trump’s proportion of votes from that of Biden’s (for Maryland—which went 65.8% to 32.4% for Biden—this would be 0.334). I dubbed this measure of partisanship the “Biden margin.”
Below, I plotted case rate against the death rate (which means being in the top-right is very bad) and colored the points according to Biden’s margin of victory. The plots speak for themselves: in June, blue states suffered from higher death rates, and red states tended to have more cases; by October, the data points aligned, and the states with both the most deaths and the cases were red.
So what changed between June and October? I plotted the difference in cases and deaths between June and October against Biden’s margin. Here, partisanship is significantly correlated with both the increase in cases (r=0.483) and the increase in deaths (-0.589). All of this suggests that, from June to October, partisanship significantly influenced state success in containing COVID-19.
But these data faced some important limitations. First, state-level data is highly aggregated, which exaggerates county-level effects and makes it difficult to establish significance. Second, the cases and deaths data were limited to June and October totals, which is a problem if we want to examine more recent trends or specific days. I had a good visual for a class project, but not a thorough analysis.
I decided to build my own, county-level dataset. To test the strength of the partisanship-COVID relationship, I tried to control for significant predictors of COVID cases, such as race, density, and geographic location. To test potential mechanisms for the relationship, I also collected data on mask use and mobility. I compiled,[^2] with assistance from other data aggregators, the following data sources:
Population density from 2010 U.S. Census.
Demographics (race, income, age, etc.) from 2010 U.S. Census.
COVID cases and deaths compiled from New York Times data.
County mask usage from a New York Times survey
- Testing rates collected by John Hopkins University.
Mobility data from the Department of Transportation.
Population-weighted geographic centers from the 2010 U.S. Census.
- 2016 and 2020 presidential election results scraped from news sources.
- 2020 state-level election results I compiled previously.
After merging the datasets, I had complete features for 3,111 instances, which spans nearly every county. Below are summary statistics for each feature, with the COVID cases/deaths omitted for brevity.
Exploratory Data Analysis
I began by validating the state-level analysis and looking for new trends. I isolated the COVID cases and deaths over April, June, October, and December, which roughly represent the different stages of the pandemic. Based on previous literature, we would expect June to be the “turning point” where blue states began to outperform red states.
I calculated a new “Biden Margin” and plotted them against cases below. The trend is generally validated—while blue states had more cases in April (r=0.22), that trend began in reverse in June (r=0.17), with an stronger, opposite relationship in December (r=-0.22).
A similar story holds for the death rates, which are plotted below (April is omitted due to lack of data). From June (r=0.20) to October (r=-0.12) to December (r=-0.23), the relationship between partisanship and COVID deaths reversed. By December, every single county with more than 150 deaths per 100,000 was a red state. The differences are stark.
Two further observations are in order:
First, 2016 election results are not as strongly correlated with December cases (-0.20 v. -0.22) or deaths (-0.22 vs. -0.23). It suggests that even over the last four years, the political alignments have shifted.
Second, state-level election results are less predictive of December cases (-0.15 v. -0.22) and much less predictive of deaths (-0.07 vs. -0.23). This validates that county-level granularity boosts predictive power.
Next, I looked at the impact of mobility and mask use on cases in December. Mobility is measured by the total trips per person in November. Mask use is measured by the percent of people who reported wearing masks “all the time” on a survey.
As one might expect, mask use has a significant negative correlation with (r=-0.24). More importantly, mask use seems to be more correlated with Biden’s margin, which suggests that it could be a mediator. On the other hand, mobility is (unintuitively) negatively correlated with cases, with the most mobile counties tending to be red.
I used the following variables in regression:
dec_deathsrefers to December COVID cases and deaths per 100,000, respectively.
testsreferred to state-level COVID tests per 100,000.
densitywas residents per square mile.
incomewas the median household income.
elderlywas the proportion of residents aged 60 and older.
hispanicwere the proportion of the respective races.
latitudewas the geographic latitude of the state.
work_distancewas the miles driven to work, on average.
household_sizewas the average residents per households.
county_marginwas the Democratic margin at the county level while
state_marginwas the Democratic margin at the state level.
tripswere outlined above. ‘at_home’ was a measure of the number of people staying home.
icu_rateis the number of ICUs per 100,000.
I used four linear models for case rates.
Model 1 uses standard explanatory variables to predict case rate. Model 2 tests the marginal contribution of partisanship to explaining variance. Model 3 assesses the marginal impact of state partisanship and interactions with county partisanship. Model 4 controls for social distancing measures.
I used four similar models for death rates.
The motivations behind these latter four models are similar. Model 5 will predict death rates from population features. Model 6 and 7 will assess the impact of county and state partisanship. Model 8 will control for social distancing measures and ICU availability.
The regression results for case rates are below.
|Model 1||Model 2||Model 3||Model 4|
|county_margin x state_margin||-1417||.||906***|
Significance: *** = p < 0.001; ** = p < 0.01; * = p < 0.05
The regression results for death rates are below.
|Model 5||Model 6||Model 7||Model 8|
|county_margin x state_margin||-49||.||367***|
Significance: *** = p < 0.001; ** = p < 0.01; * = p < 0.05
Here are eight insights we can draw from our eight models:
Explaining local variation is hard. Model 8, the best model, explains only 13.7% of the variation in county-level deaths. In contrast, using fewer features that we used, a previous study accounted for 69% of the variation in state-level cases. The difference suggests researchers conducting regression on state-level data should be cautious about extrapolating results to states. It also indicates case rates are complex and may be influenced by factors outside of a county’s control.
- Density doesn’t matter. Against intuition but in line with past studies, density has no significant relationship with either cases or deaths. More interestingly, larger households and longer commutes are associated with fewer cases and deaths. I speculate that these variables actually correlate with living in the suburbs, which may in turn be more socially distanced.
- Race and class do matter. But not exactly how you might think. A higher median household income is associated with significantly fewer cases, but it has little to no effect on death rates after controlling for partisanship. And, Hispanic populations seem have a highly significant association with death rates, a result corroborated with other studies.
- Partisanship mediates some relationships. The proportion of black residents initially had a highly significant, depressing effect on both case rates and death rates, but the effective disappear after controlling for partisanship. The same goes for latitude. Initially, latitude correlates with fewer cases and a small increase in deaths. After controlling for partisanship, the positive impact on cases shrinks and the negative relationship with deaths becomes highly significant. Both these variables are likely confounded by partisanship: bluer counties are farther North and more diverse.
- Old people get sick less, but die more. The proportion of people aged 60 and older has a highly significant negative effect on the case rate, but also a significant positive effect on the death rate. The paradox reflects a sad reality on the ground: older communities are trying harder to suppress of COVID, but ultimately still die at higher rates.
- Democrats have done better. County partisanship have a highly significant impact on outcomes in every model. After controlling for standard explanatory variables, county election results explain an additional 2% of the variance in both cases and deaths. In that model, each 1% increase in the margin of Democratic votes is associated with a decrease of 6.4 cases and 0.32 deaths per 100,000.
Politics are complicated. For case rates, the coefficients of state and county partisanship are similar (-666 v. -620). This reflects the structure of governance: if you’re in blue county like Houston, policies passed by the red state legislature and governor will still greatly impact you. At the same time, it means that even in the absence of supporting state policies, local leaders can make a significant impact on case rates. Interestingly, while the state-county partisanship interaction variable is always highly significant, state partisanship is not significant for predicting death rates. Perhaps this suggests that local policies are greater determinants of COVID mortality.
- Social distancing works. Mask use and fewer trips have highly significant depressing effects on both case rate and death rate. They mediate some of the observed relationship between partisanship, cases, and deaths, helping us explain part of why bluer states to better: they practice more social distancing. Strangely, however, the proportion of at-home residents is positively associated with case and death rates. Perhaps the direction of causation is backwards, and an existing crisis prompts people to stay at home, but then we would expect to see the effect for trips as well. It could also be that being at home correlates with other risk factors for COVID, such as age or joblessness. It isn’t clear.
The article has examined the relationship between partisanship, COVID outcomes, and related explanatory variables. The results largely corroborate existing research on standard explanatory factors and highlight the difficult of explaining county-by-county variance.
County partisanship has a highly significant impact on case rates and death rates that persists after controlling for explanatory variables. State partisanship is highly significant for cases, and the interaction between state and county partisanship is significant for both outcomes. The effect is mediated in part by social distancing measures, specifically mask use and decreased travel.
The analysis was subject to several limitations:
- The test data were only available at the state level, so the models may have failed to fully control for testing rates.
- Many demographic features used 2010 Census data, which may not reflect the current state of the country.
- Some data seemed to misreport cases or deaths, resulting decreases in cumulative case counts. These were included for comprehensiveness, but may have skewed the results.
- Normality and homoscedasticity were assumed for the data, but that assumption may be inaccurate, particularly for the household size and latitude variables.
There are several avenues for future research:
- Researchers can incorporate dates and extent of policy interventions like restaurant closures and gathering restrictions as explanatory variables.
- The analysis could be updated with 2020 Census and ACS data.
- The influence of 2012 or 2016 election results could be compared to those of 2020 to evaluate changes in political attitude.
- The time frame can be extended into the future months or past months to assess the change in relationships over time.
- Use causal inference techniques to examine the relationship between cases and staying at home.
Thanks to the those who maintained the datasets that this analysis used. The data sources and compiler are available in my repo and can be used to compile updated statistics.