Mapping the decline of subway ridership in the Covid era
Ridership on New York City’s subway system has declined significantly amid the Coronavirus pandemic. The Metropolitan Transportation Authority (MTA), which runs the subway, is in the midst of a fiscal crisis — reportedly facing a $16 billion deficit due to the decline.
It’s easy to see the severity of the situation after visualizing the daily ridership over 2020. Within the course of one month, which included a New York statewide stay-at-home order, the MTA witnessed an unprecedented fall in straphangers. But is the decline in ridership spread equally across NYC’s diverse residents and neighborhoods?
The fall can be easily quantified by taking the difference between mean ridership before the precipice and mean ridership once it flattens out. This results in an 88% decline in system-wide ridership.
Extending this basic methodology to each subway station reveals a distinct pattern playing out across the boroughs. Manhattan — home to the wealthy and the transient — saw the largest declines in ridership while the outer-boroughs — especially the Bronx — still saw large but relatively the smallest declines. A Voronoi diagram (below left) is overlaid on the map to estimate the encompassing neighborhoods around a given subway station. This provides a reasonable approximation of the persons who are likely trafficking the station.
To dig deeper into understanding who these persons are, projecting the points on to the Census Public Use Microdata Areas (PUMA, below right) allows us to tie the ridership to demographic variables tracked by the Census via the American Community Survey (ACS) such as income, job industry, etc.
Regions mapped by proximity to subway station
Regions mapped by Census PUMA code
The first question to be addressed: are smaller decreases in ridership explained by essential worker status? Essential worker status can be approximated based on the industry residents labor in. Aligning these industries to the Delaware essential industry list (one of the few discrete lists available), the proportion of essential workers can be estimated (below left). It’s difficult to verify the accuracy but it is in line with the US-wide estimate from the Economic Policy Institute which claims 55 million American workers are essential.
Second, how does ridership correlate with income? Below right. The New York Times found that income was a strong predictor of the emptying of a neighborhood during Covid. Their analysis was based on smartphone location data of residents and excluded commuters. Additionally, The City found that collected household trash tonnage reduced 5% in the wealthier parts of Manhattan even as NYC overall had a 4.1% increase in trash. That may reflect an exodus of the wealthier residents.
The income story is right in line with expectations with Manhattan and the northern parts of Brooklyn clearly representing the wealthiest neighborhoods. The essential worker map is a mixed bag; parts of Brooklyn and Manhattan contain more essential workers than expected. And some contain less than expected such as East Williamsburg and Bushwick in Brooklyn and Manhattan near Columbia University. However, the Bronx and south Brooklyn contain the most which aligns well with the ridership map.
Just as the New York Times found on neighborhood activity, there is a clear trend between a neighborhood’s household income and the decline in subway ridership at the surrounding stations using a simple regression with log transformed income (below left). However, this relationship becomes more nuanced once you control for the borough (below right) indicating that income alone is not enough to accurately predict ridership. Notably, the Bronx and Queens have different levels of ridership change at similar income levels.
Unlike income, essential worker status does not appear to have a strong correlation with ridership change across the city (below left). After controlling for borough, the Bronx and Brooklyn have clear positive relationships and Manhattan clear negative (below right). Manhattan is a curious case, and I suspect it has something to do with non-resident ridership.
A linear model fitted with all three variables — essential worker status, household income, and borough — tells a similar story as the plots above. Controlling for household income and borough, a 1 percentage point increase in essential worker status is associated with a 0.4 percentage point increase in subway ridership. Whereas controlling for essential worker status and borough, every 10% increase in household income is associated with a 0.8 percentage point decline in ridership.
The statistical model is specified below. The underlying data generating process is a difference in count data so a Poisson or Negative Binomial may be most appropriate. Stations may also be spatially autocorrelated. A simple OLS linear regression was used for simplicity in interpreting coefficients. The residuals are approximately normally distributed and there is no heteroscedasticity.
$$\begin{align} Y =\: &\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_3 \: + \\ &\beta_4 X_4 + \beta_5 X_5 + \epsilon \end{align}$$ \(Y =\) % change in ridership
\(X_1 =\) % of workers deemed essential
\(X_2 =\) log(household income)
\(X_3 =\) BoroughBrooklyn
\(X_4 =\) BoroughManhattan
\(X_5 =\) BoroughQueens
Term Estimate Standard error p value Intercept -0.02 0.16 0.90 % of workers deemed essential 0.43 0.20 0.04 Household income (log) -0.08 0.01 0.00 Brooklyn -0.03 0.01 0.05 Manhattan -0.04 0.02 0.03 Queens -0.02 0.02 0.15 Formally, the coefficients can be interpreted as:
- Controlling for household income and borough, a group of neighborhoods that had a 100 percentage point increase in essential worker status than another group would be expected have to a change in ridership that is 43 percentage points higher.
- Controlling for essential worker status and borough, for a 10% increase in household income, the difference in the change of mean ridership between two groups of neighborhoods is expected to be -0.08 * log(1.10) = -0.77 percentage points.
There are disparities in the decline in subway ridership across NYC’s diverse neighborhoods. Some of which can be statistically explained by essential worker status and household income, however it is important not to interpret these causally. The above model does not make a causal claim and may not even capture all the relevant variables. For example, if someone’s goal is to conflate ridership with residents’ adherence to social distancing, the model does not account for the desertion of Manhattan by the wealthy or the impact of New Jersey and Connecticut commuters.
Explore the data yourself using the interactive map below. Hover over your neighborhood to see how you compare.
There are a handful of limitations to this data and analysis to be aware of:
2020 September
Find the code here: github.com/joemarlo/NYC-data/