This is an exploration of 2016 US presidential campaign donations in the state of Massachusetts. For this exploration data analysis, I am researching the 2016 presidential campaign finance data from Federal Election Commission. The dataset contains financial contribution transaction from April 18 2015 to November 24 2016.
Throughout the analysis, I will attempt to answer the following questions:
Which candidate receive the most money?
Which candidate have the most supporters?
Who are those donors? What do they do?
How do those donors donate? Is there a pattern? If so, what is it?
Does Hillary Clinton receive more money from women than from men?
Is that possible to predict a donor’s contributing party giving his (or her) other characteristics?
Univariate Analysis Section
This dataset contains 295667 contributions and 18 variables. To start, I want to have a glance how the contribution distributed.
I realized that there were so many outliers(extreme high and extreme low values), it was impossible to see details. And there were negative contributions too.
Transforming to log10 to better understand the distribution of the contribution. The distribution looks normal and the data illustrated that most donors made small amount of contributions.
Interesting to see how people donate. the most frequent amount is $25, followed by $50, then $100. And the minimum donation was -$84240 and maximum donation was $86940.
To perform in depth analysis, I decided to omit the negative contributions which I believe they were refund and contributions that exceed $2700 limit, because it breaks Federal Election Campaign Act and will be refunded. This means 5897 contributions are omitted.
I will need to add more variables such as candidate party affiliate, donors’ gender and donors’ zipcodes.
After processing the data, I added 5 additional variables to help with the analysis, and removed 5897 observations because they were either negative amount or amount exceed $2700.
The additional variables are:
party: candidates party affilliation.
contbr_first_nm: contributor’s first name will be used to predict gender.
gender: contributor’s gender.
Latitude: Donor’s latitude for map creation.
Longitute: Donor’s longitude for map creation.
After adding the variables, I wonder what the contribution distribution looks like across the parties, candidates, genders and occupations.
Until November, 2016, total number of donations made to the presidential election near 269K, and the Democratic party took more than 243K and almost 10 times of the number of donations made to the Republican party.
There were total 25 candidates, Hillary Clinton was the leader in the number of contributions, followed by Bernard Sanders, then Donald Trump.
Interesting to know that there were a lot more women than men to made donations, about 26% difference. Was it because of Hillary Clinton? We will find out later.
Who are those donors?
When we count the number of donors, retired people take the first place, followed by not employed people, teacher comes to the third, homemaker and engineer are among the least in terms of number of contributions.
And it is also interesting to see when people made contributions. The date distribution appears bimodal with period peaking around March 2016 or so and again close to the election.
Observations:
Most people contribute small amount of money.
The median contribution amount is $28.
The democratic party receive the most number of donations.
Hillary Clinton have the most supporters.
There were 26% more women than men to make contributions.
Retired people make the most number of contributions.
Bivariate Analysis Section
The total contribution amount made to the presidential candidates grossed over 30 million US dollars in Massachusetts. We can easily see where the money went.
Democratic party takes the majority share of donor contribution. Democratic party got more than 25.8 mollion US dollars in total, which is 5.6 times of what the Republican received. It is getting worse for the Republican when comes to the average amount, as there were 17 Republican candidates and only 5 Democratic candidates.
Same with the number of contributions, Hillary Clinton received the most contribution amount followed by Bernard Sanders then Donald Trump.
There is no surprise as Massachusetts is the home of Kennedy family, and routinely voted for the Democratic party in federal elections. And Hillary Clinton has decades-deep roots in Massachusetts politics.
To see contribution patterns between parties and candidates, I start with boxplots.
However, it is very hard to compare contributions among all parties at a glance because there are so many outliers. I will apply log scale and remove the ‘others’ party from now on because my analysis is focused on the Democratic party and the Republican party.
Now it is much better. Although the Republican has the higher median and mean, the Democrat has more variations and the distribution is more spread out. This indicates that the Democrat has more big and small donors.
Now the picture looks interesting. Christopher Christie, Lindsey Graham and George Patake have the highest median, Jeb Bush has the greatest interquartile range while Hillary Clinton and Bernard Sanders seem to have the lowest median. But Hillary Clinton has the most outliers(big pocket donors) than anyone else. Bernard Sanders has significant number of outliers as well.
Now let’s examine within parties.
Within each party, majority of the donations were received by only few candidates. For Democratic party, Hillary Clinton and Bernard Sanders take almost 99% of all donations to the Democratic party, and of which, 81% went to Hillary Clinton. For the Republican party, Donald Trump led the way taking 41% of all donations to the Republican party. Donald Trump, Marco Rubio, Ted Cruz, John Kasich, Jeb Bush all together taking 83% of all donations to the Republican party, the remaining 17% were shared by the other 12 Republican candidates.
From the above charts, we are able to see who were the top candidates in each party in Massachusetts. I will examine the following candidates who received at least 9% of total donations in their party in details later.
We have seen earlier that women made 26% more number of contributions than men. Is that the same for the amount of money donated? And do women tend to donate more to the liberals and/or to woman candidate?
On average, male donated $131 and female donated $99.8, there is a 30% difference between genders. Female contributed much less than male when we look at median, mean and third quartile.
However, when we look at the total contribution amount between genders, they were very close.
Female in Massachusetts contributed a little less than 15 million US Dollars in total to the presidential campaign in 2016, of which, more than 11 million Dollars went toward Hillary Clinton. This confirms that Massachusetts women donate more to the liberals and/or to woman candidate.
Earlier we have seen that retired people make the most number of contributions, how about total contribution amount and average contribution amount cross top 10 occupations?
Again, retired people take the first place in terms of total contribution amount followed by not employed people, attorney comes to the third. However, when we look at the average contribution amount, attorney comes to the first, and homemaker takes the second place (presumably most of homemakers are women). Unemployed people contribute the least on average. This does make sense.
Surprisingly, software engineer in Massachusetts has been stingy giving their above average income and long history of reliable source of presidential donations. Perhaps this article can answer my question.
I want to dive deeper to investigate the contribution amount distribution among occupations. a boxplot sounds like a good idea. But this one is hard to see because there are so many outliers.
This looks much better. After I filtered out outliers (donations that are extreme high), a boxplot confirms my above observation. The median contribution of teacher, homemaker and unemployed are relatively low.
It is still apparent that attorney made the large contribution with the highest
average donation and the largest variability. Some of them contributed 4 times of their respective median.
Some of the interesting findings I observed in this part of the investigation:
Most of the total contribution in Massachusetts (84%) went toward the Democratic party.
There were 5 Democratic candidates and 17 Republican candidates. Therefore, there is even bigger difference when we compare average amount between parties.
Within each party, the majority of contributions are received by a few candidates.
In Massachusetts there are more female donors than male donors, but female donate much less than male on average.
In Massachusetts, majority of the contributions from female donors went toward Democratic party and/or woman candidate.
Retired people contribute the most in total amount, and software engineers and engineers are among the least in total contribution amount.
Lawyers had the highest average contribution amount and greatest interquartile range, unemployed people have the lowest average contribution amount and one of the smallest interquartile ranges.
Surprisingly, homemakers had the 2nd highest average contribution amount, but the median contribution in this group is among the lowest. It suggests that the distribution of the data is right skewed with many outliers. Also my presumption is that most of the homemakers are women.
Multivariate Analysis Section
We know that Hillary Clinton raised the most money and had the most supporters in Massachusetts. But is this always true throughout the campaign process? When I look at above 2 graphs, I notice 2 things:
Bernard Sanders actually raised more money than Hillary Clinton started from January 2016 lasted for a few months.
Bernard Sanders actually had more supporters than Hillary Clinton from January 2016 onward until June 2016 when he announced to endorse Hillary Clinton that broke his supporters’ hearts.
Interesting to see every top candidates’ time series trend. Ted Cruz had a slow and steady growth in contribution amount, that ended as soon as he suspended his campaign in May 2016. Marco Rubio dopped out even earlier in March 2016. Donald Trump’s contribution donation had a steady growth until around September 2016. His campaign probably did not spend a lot of money in Massachusetts.
It looks like more republicans concentrated around Boston area, this does make sense as Boston is the largest city in Massachusetts. But look, how blue the state is!
Predictive Modeling
In this section, I will attempt to apply logistic regression method to predict a donor’s contributing party giving his (or her) location (latitude, longitude), gender and donation amount. I will be taking the following steps:
Subset the original dataset selecting the relevant columns only and make sure to filter out the ‘other’ party.
Clean and format data.
Remove negative sign in longitude for calculations.
Create a model to predict a donor’s contributing party based on gender, latitude, longitude and contribution receipt amount.
Interpreting the Results of the Logistic Regression Model
For a one unit increase in latitude, the log odds of contributing to Republican decreases by 0.75.
For a one unit increase in abs(longitude), the log odds of contributing to Republican decreases by 0.09.
For a one unit increase in contribution amount, the log odds of contributing to Republican increase by 0.0004.
If all other variables being equal, the male donor is more likely to contribute to Republican.
Assessing the predictive Ability of the Model
Wow! The 0.94 accuracy on the test set is a very good result. However, this result is based on the mannul split of the data I created earlier. It may not be precise enough.
Some of the relationships I observed in this part of the investigation:
While closer to the election, more big pocket donors supported Hillary Clinton.
While closer to the election, less donation went toward Donald Trump.
For a certain period of time, Bernard Sanders received more donations and gained more popularity than Hillary Clinton.
Conclusion
By analyzing Massachusetts financial donation data, I found several interesting characteristics:
It is no doubt that Massachusetts is one of the bluest states.
Few candidates collected the most donations.
Female tend to donate more to liberals and/or to female candidate.
Bernard Sanders gained more popularity than Hillary Clinton until he gave up his run.
Future Work
The analysis I conducted is for Massachusetts state only. It would be interesting to analyze campaign finance data for some swing states such as Ohio or Florida, as well as campaign finance data nationwide. I am sure the picture would be very different.
Although the election is over, Americans have seen the post-election surge in donations. There will be more interesting financial contribution data to analyze.
Source code that created this post can be found here. I am happy to hear any feedback and questions.