From her cubicle in a quiet New Jersey office park, Maryann Hand called voter after voter in Iowa with this public-service appeal:
“Your participation is very important because only 500 people have been randomly selected for this survey," she read from a script. “Your views will represent many people throughout the state.”
But would they? It was the week before the Iowa caucuses, and most people chose not to speak with Hand and her colleagues in the Mount Laurel office of Braun Research, calling on behalf of the Monmouth University Polling Institute. After four evenings they finally got 544 people, with the biggest share — 23% — saying they favored Joe Biden.
Yet in the actual caucuses, as organizers sorted out the chaos from a balky vote-counting app, Biden’s support appeared lower than that, behind Pete Buttigieg and Bernie Sanders. Did Monmouth (and plenty of other pre-Iowa polls) get it wrong?
Not at all, if you consider what polling really measures.
Polling is not designed to predict what will happen on an election day. It merely provides an estimate of what people are thinking at a given moment before election day, within that range you hear about so often on the news: the margin of error.
Here’s how it works.
The fundamental math behind sampling has been well-established for centuries, allowing pollsters to get a reliable sense of how a large population of people is thinking by calling just a few hundred of them, provided they were chosen at random.
An explanation would require a blackboard full of equations, but the end result makes intuitive sense. The more people a pollster calls, the higher the chance of getting an accurate result — that is, the narrower the margin of error. If we polled everyone, we would know how everyone felt — a margin of error of zero. But even if we survey just 500 people in a large population, it turns out the margin of error is still fairly narrow. Do the calculations, and the margin for this sample size is plus or minus 4.4 percentage points, using what’s called a 95% confidence interval.
That means if you could somehow conduct the poll again and again an infinite number of times — calling a new group of 500 people each time — the result would fall within 4.4 points of the true number on 95% of those occasions.
Here’s an example: Let’s say that in a group of one million people, the actual level of support for Candidate X is 42%, though of course we don’t know that. The first time we poll 500 people, perhaps 41% of them will say they like the candidate. Maybe next time, we’d get 43% or 44%, followed by 42% a few times in a row. Over time, most of the results would fall between 38% and 46%. Plot all those points on a graph, and results cluster toward the center, forming a shape called a bell curve.
The more people a pollster calls, the narrower the margin of error. But there are diminishing returns, said Mack C. Shelley II, an Iowa State University professor of political science and statistics. Calling 2,000 people — four times as many — would reduce the margin of error only by half.
Be wary of the “horse race” trap. In the Monmouth poll, the four candidates with the highest totals — Biden, Sanders, Buttigieg, and Elizabeth Warren — were bunched so closely together that institute director Patrick Murray could not say which was ahead. With a 4.4-point margin of error, a candidate needs to be at least 8.8 points ahead of an opponent for a statistician to say with confidence that he or she was truly in the lead.
The same goes for saying that a candidate has “gained” from one poll to the next. Unless the percentage changes by a lot, be careful.
“Polls were never the precise measures that the media tended to portray them as,” Murray said.
As for why the Monmouth poll and other pre-Iowa results did not match the caucuses, Murray’s results indicate that many voters made up their minds after the polls were taken (a common occurrence in early-stage primaries, with lots of candidates). What’s more, turnout appeared to be low among voters from some demographic groups.
Which brings us to weighting. This is a statistical technique a pollster uses to ensure that the demographics in a sample of 500 people mirror those of the population at large.
Let’s say that our group of 500 includes 250 women — 50%. But if the voting population is actually 51% female, we should have called 255 women. To compensate, the pollster would give a bit more statistical weight to each woman’s response. Instead of counting as a response from one person, it would count as 1.02 (255 divided by 250).
The key is to do this for the right characteristics — those that tend to be associated with varying levels of support for political views, said Courtney Kennedy, director of survey research at the Pew Research Center in Washington.
If women under age 40 are more likely to vote a certain way, then pollsters need to make sure they account for how many people meet that description. Age, gender, party affiliation, and ethnicity all are commonly used to weight samples. Pollsters also try to identify people who are likely voters, either by asking them or using voting records.
Decades ago, before the advent of caller ID and cell phones, pollsters commonly achieved response rates above 50%. That is, not only did that many people answer the phone, but they also agreed to answer the questions.
These days, pollsters are happy if they succeed with just 5% of the cell-phone users they try to call. And calls to land lines are not much better — generally below 10%.
The solution is simple: Call more people. But once again, there is the question of weighting. It turns out that those few people who do answer phone calls from unfamiliar numbers are more likely to have a college degree. And in 2016, people with college degrees were more likely to favor Hillary Clinton. That meant skewed results for state-level polls that failed to ask about education and weight the results accordingly.
Other pollsters have turned to online surveys, which in theory will work just as well — provided they account for any differences between those who agree to participate and those who do not (or cannot, if they lack internet access).
Pollsters sometimes grapple with a perception that they try to steer people with leading questions. With certain policy issues (“Do you think a soda tax is a bad idea?”), that might be true if the survey is funded by an advocacy group. Don’t be afraid to ask.
But the big-name pollsters, such as those with ties to universities and media outlets, are doing their best to play it straight. In addition to using neutral language, they generally rotate the order in which candidates’ names are presented. Warren might be first one time, Amy Klobuchar the next.
That’s because people who have not yet made up their minds, when forced to make a choice, are slightly more likely to pick the first or last name they hear — not one in the middle, Monmouth’s Murray said.
Again, math. As an example, look at Pennsylvania, which has 8.5 million registered voters.
Even if pollsters were to conduct 50 polls between now and November, calling 500 people each time, that would add up to at most 25,000 voters — even fewer if by some chance anyone was called by multiple polls.
That number represents just three-tenths of 1% of the total number of voters. Most people will not hear from a pollster this year — or if they do, they might not recognize the number and decline to pick up.