Elections can be challenging to model. Philadelphia, in a Democratic primary, has 66 wards, over 100 candidates, and thousands of issues. And wards, candidates, and issues don’t exactly wear signs that declare what “type” they are. We broadly talk about “party” candidates, “reform” candidates, “machine” candidates, but how can we identify them, and know that those factors are really what drive elections? Where do race and class come into play?
Political scientists have a family of methods for statistically identifying unobserved traits of groups and candidates by looking at sets of candidates that typically appear together—for instance, who vote for the same bills, who are endorsed by the same candidates, and so on. In national elections, the main dimension generally maps along liberal-conservative divides. These methods are behind the headlines diagnosing increasing polarization.
In this post, we apply this latent factor identification to Philadelphia Democratic primaries, looking at the wards in which candidates typically over-perform. What are the different types of candidates in Philadelphia’s Democratic parties, and in which wards does each type find its support? By looking only within the Democratic Party, we aren’t going to find an obvious left-right division. Instead, we will identify groups within the Democratic Party’s tent. In similar work by Konstantin Kashin and John Myles White in San Francisco, which inspired this post, there was a stark split between the Democratic Party’s Progress slate and its Reform slate, which outsiders can broadly think of as the Hillary-wing and the Bernie-wing.
The Method
For this analysis, we allow wards and candidates to have a score in each of two dimensions. We don’t specify what these dimensions represent, but we hope that those lessons will be apparent in the results (spoiler: they are).
The total votes received by candidate C in Ward W is estimated as:
log(Vote for Candidate C in Ward W) = log(Ward turnout) + City-wide adjustment for Candidate C + scale_1 (Ward W’s score in dimension 1 x Candidate C’s score in dimension 1) + scale_2 (Ward W’s score in dimension 2 x Candidate C’s score in dimension 2)
We use the statistical software Stan to simultaneously identify each ward’s scores and each candidate’s scores. The scores are centered at zero, so wards and candidates will sometimes have negative scores; a candidate with a negative score in dimension 1 will do well in a ward with a similarly negative score in dimension 1. The sign isn’t meaningful; we could switch all of the positives with negatives and the results wouldn’t change. The scales capture the relative size of the effect of each dimension.
We fit the model on the Philadelphia Democratic Primaries from 2009-2016, using only city-wide elections that were competitive (including ballot questions). You can download the data and code from Github.
The Dimensions of Philadelphia Elections
The result of the model is two different dimensions, and each ward and candidate gets a score for each. The model doesn’t have anything to say about what the dimensions might mean—it simply mechanically identifies groups of wards that tend to vote for the same candidates—but visual patterns emerge, and I have ad-hoc chosen names for each of the dimensions. Remember that we control for a candidate’s city-level support, so these dimensions only capture the differences between wards.
I should point out that this method also doesn’t identify what causes the patterns. When we see a candidate do well in a certain type of ward, it may be that something about them was attractive to voters in that ward, or it could be that they campaigned more aggressively there, or simply had better name recognition in those neighborhoods. When I identify correlates of the patterns, it isn’t to imply a direct cause from one to the other.
Dimension 1: White-Black Divide
Dimension 1 inescapably identifies a White-Black divide in Democratic primaries, among lower- and middle-income wards. The wards with large negative values in this dimension (remember that the sign itself isn’t important) are in South Philly, Manayunk, Kensington, and the Greater Northeast. The wards with large positive values are in West, North, and Northwest Philly. These wards tend to vote for the same candidates, and it appears after-the-fact that the unifying factor is probably race. This dimension has a scale of 0.33, which means that a candidate with a score of 1 in a ward with a score of 1 would gain a boost of about exp(0.33) = x1.4 votes in that ward (and a candidate with a score of -1 would sink by x 1/1.4).
Remember that we are controlling for a candidate’s overall performance, so it could certainly be the case that a candidate won in a ward on the opposite end of the spectrum; however, that candidate would do disproportionately worse in that ward—and win by less—than in a ward at the same end.
Who are the candidates that do well in these wards?
The wards with negative scores (in South Philly, Kensington, the Northeast) are where Bernie Sanders did particularly well. These wards also had strong support for Jim Kenney, Lynne Abraham, Frank Rizzo, and Helen Gym. (Candidates may appear twice on this chart if they ran in two different elections in our time frame).
On the other end of the spectrum, the wards with positive scores (in West Philly, North Philly) are where Hillary Clinton did disproportionately well. The wards also voted disproportionately for Milton Street, Seth Williams, and Anthony Hardy Williams.
This dimension may also capture trust in government institutions. In addition to scores for candidates, we have similar scores for ballot questions. The wards with negative scores in Dimension 1—South Philly, the Northeast—were much more likely to vote “no” for just about every ballot question, including creating commissions on Universal Pre-K and Women, forcing City agencies to develop language access plans, and creating an Office of Sustainability. The only two for which they were more likely to vote “Yes” was to allow the city to publish notices in places besides newspapers, and to abolish Traffic Court.
Dimension 2: Hispanic-Wealthy Divide
The second dimension identifies a voting split between predominately-wealthy wards—Center City, Mount Airy—and predominately-Hispanic wards, including North Philly. (An aside: calling this a “wealthy-vs.-Hispanic” divide is in-artful, incorrectly implying that the two are mutually exclusive. However, it is true that Philadelphia’s wealthiest wards occupy one end of this spectrum, and its mostly Hispanic wards, the other end). The scale of this dimension was somewhat smaller; a candidate with a score of 1 in a ward with a score of 1 experienced a boost in votes of x 1.2.
Interestingly, this division did not affect the presidential election; both Hillary Clinton and Bernie Sanders have scores of zero on this dimension, meaning that they did not perform differently in wards based on this spectrum.
Wards with negative scores in this dimension—in North Philly, and South and Northeast Philly—disproportionately voted for Nelson Diaz, Milton Street, and Frank Rizzo. Wards with positive scores—Center City, Mount Airy—disproportionately voted for Helen Gym, Paul Steinke, and Seth Williams.
Whereas the first dimension captured a simplistic Yes-No difference on ballot questions, this dimension divides questions on content. Notable questions on which these wards disagreed include creating the Office of Property Assessment (“Yes” votes came from Center City and Mount Airy) and allowing the posthumous promotion of firefighters (“Yes” votes came from North and South Philly and the Northeast).
What will happen in May?
This descriptive analysis can help us understand May’s upcoming primary. We don’t know exactly where this year’s candidates will rank on these dimensions—these scores are only assigned after the fact—but candidates can look along these lists, identify candidates who seem similar to themselves, and guess where they might fall in these dimensions. That, in turn, can tell them in which wards they might fare better and where they may fare worse.
One foretelling note is that Dimension 2 seems to line up one-to-one with relative turnout in D.A. primaries. The wards with positive scores in this dimension—in Center City and Mount Airy—are also the wards where voters are most likely to show up in May, while the wards with negative scores—in North Philly—are the wards where D.A. turnout lags the most. So expect Dimension 2 to prove especially relevant this year.
Jonathan Tannen, Ph.D., was previously a Director at Econsult Solutions, Inc (ESI). Jonathan’s dissertation research used GIS and large-scale computational techniques to develop a Bayesian method to measure the movement of neighborhood boundaries.