The Top Five Reasons Polling Failed to Predict the 2016 Presidential Election
With nearly every large political pollster getting the outcome of the United States 2016 Presidential Election between Hillary Clinton and Donald Trump wrong there are many people who want to understand how this happened. We compiled our top five reasons that the polling failed to forecast the winner accurately.
I would like to note that my firm, Mindspot Research, although a global provider of both qualitative and quantitative Marketing Research, does have a policy of declining political polling work. However, given our expertise in research methodology we have considered the potential underlying causation.
The United States is changing, the speed of decisions is accelerating, how much people have to do in a day is increasing, and the access to information and how quickly it is received has increased exponentially in the last four years. And, let’s face it polling has changed somewhat and perhaps not at the level that society has changed and as a result there is a fundamental shift in who you are reaching to get that polling answer and their level of engagement and commitment to both candidates and casting their vote in general.
We believe there are at least 5 possible reasons that the political polling was wrong and the 2016 Presidential Election had a shocking result:
Confirmation Bias: At least for the Democratic polls this is likely a large contributor to not questioning the outcome or the model for polling. When a result is expected or an outcome that is agreed on, often the rigorous analysis that would normally be applied may be forgotten. Confirmation bias happens when we or those conducting the polling unknowingly accept flawed polls without being critical of the methodology simply because they were given the likely result they expected or the result with which they agree with.
Sampling Error: It’s likely another major contributing factor for the polls not predicting the outcome of the 2016 election. I’ve heard that people are discounting sampling error as a major contributing factor; however, most of them are not life-long researchers. Here’s why it matters: Since the 1940’s polling began using random sampling, meaning that everyone of voting age had a probability of being involved in an election poll. And random sampling is still being used as a traditional polling methodology. Within the sampling plan it’s possible that there has been a shift to more of a Non-probability Sample versus a Probability Sample. Polling has been migrating from telephone polling to online polling or internet polling over the last 10 years. This creates a change in the way people are reached and with trends driving polling to the internet it’s possible that more responders are self-selected – meaning the respondents may have volunteered to be polled. This changes the respondents – the sample of the population from being a randomly selected probability sample to being potentially composed of a non-probability sample, which is less predictive in market. This may be fine if it had been previously tested and multiple outcomes verified. However, if not it is a potential shift from the traditional methodology. And, if that is the case, from more of a Marketing Research perspective, a stratified sampling plan for polling may have reduced the sampling error. A stratified sampling plan uses “layers” to segment the population. If there were different layers for example, for rural and city geographies built into the plan that would have helped predict those differences that came through clearly in the election. Stratified sampling ensures that the diverse layers of the population are represented. Further, in regards to the sampling, the typical sample comes from large population centers, i.e. Metropolitan areas. And, typically, very little are polled from the rural or outlying area, and those Rural Voters came out in droves, for whatever reason. Keep in mind that votes are 1:1, not multiplicative by say usage; which means that a rural vote has equal weight to a metro vote. When you are polling a majority metro area and voter, it will bias to the Democrat side and have more Democrats in the poll. And that bias is noticeable when Metro Voter rates are below Rural Voter rates. As if this wasn’t enough, there was likely another type of sampling error called Weighting Error in play. Data is weighted to represent populations or keep from under or over-estimating certain populations. It’s possible the sample size –the number of people polled –was not large enough for the population that they were projecting to or a misaligned weighting scheme was in place. The old model of population weighting was likely being used versus newer more unconventional models. An article on the Trump team’s data analytics, eluded to weighting rural areas in an unconventional way that was more predictive than anything else being done. This points to the potential of flawed weighting or non-weighting in conventional polling.
Embarrassment and Malice: Yep, people were embarrassed to say who they were voting for so they didn’t tell the truth or didn’t respond to the poll. We did not make this up. This voter has been called “the shy Trumpster.” Further, there have been some reports that people lied to the pollsters about their voting intentions to deliberately throw off the poll results. Additionally, there is a phenomenon we like to call “your vote is my vote.” There has been an increase in the number of absentee ballots being submitted. For example, Florida had more people requesting ballots than at any time during the 2012 election. Yes, perhaps more people vote but it also introduces the opportunity for someone to vote twice. Let’s take this example- one spouse is going to the post office to drop off absentee ballots and says, “Honey, I filled yours out – sign here.” We’ve heard this happened from more than a few people.
Non-response Bias is likely a major contributor to the presidential election results in 2016. The pollsters assumed that based on their modeling that a greater percentage of voters would come out to vote. In this election most people did agree on one thing –that it was polarizing. There are committed supporters on both sides; however there was a significant portion of the population that voted for a third-party in protest or who didn’t vote. Those non-voters are a significant factor. Those potential voters, those who definitely couldn’t vote for either of the major political parties played a large role here by abstaining or casting a vote in protest. Therefore, the polarized activists on both sides seemed to be the only dependable voters; and, Trump polarized a lot more people by courting angry voters with his relatively extreme campaigning strategy.
Commitment and Polling Survey Design: Consider that if a series of questions had been asked encompassing the degree or magnitude that someone is committed to both the candidate that they are voting for and committed to voting regardless of variables such as weather, which is very much what happened in the UK and will be one of the root causes of Brexit. When people are committed they are more likely to ignore discomfort or take things for granted. This appears to be what happened with working-class white males. Given the tone and manner of this election commitment could have even been measured by asking how angry a voter might be and how that motivates the commitment and the vote. In the UK those angry voters showed up in droves in spite of the rain and inclement weather while their wealthier counterparts stayed inside only to wake up the next day with their assets being worth significantly less than when they went to sleep. Additionally, if a survey design were to include the number of bullet points a voter bases their decisions on; and of those likely few things, if they hit 100% then all candidate shortcomings would be ignored. There are ways to design the survey instrument, therefore, the polling survey design may have benefited by including the variables on which voters make decisions.
This will likely take months for those who do political polling to diagnose; however, I would submit that all of the above variables contributed to some degree. Of course, there are likely more than our top 5; however, what’s included above may explain a lot. I suspect that the largest contributors will be non-response bias and sampling plan error. That said most research even the best Marketing Research has a margin of error (polling error) and typically the tolerance for that error is acceptable. Political polling is evolving to predictive modeling. And honestly in the age of big data he or she who has the best predictive model is going to stand a better chance of being closer to the actual outcome. It’s not only about the best candidate anymore, it’s about who has the most committed voters, what motivates those voters to action, and which candidate has the best team to measure those voters and predict the outcome in the most accurate way to guide strategic campaign decisions.
Lynnette Leathers is the CEO of Mindspot Research, headquartered in Orlando, Florida.