WHEN BRITISH PRIME MINISTER Benjamin Disraeli first complained about the three kinds of lies: lies, damned lies and statistics, polls did not yet exist. Disraeli probably couldnt have imagined how appropriately his aphorism would apply to elections a century later, when the uncertainty of statistics would combine with the fog of political war to confound us all further.
Today, polls often reduce rather than enhance electoral clarity. And theyve become as controversial as the campaigns themselves. With the availability of detailed poll data on the Internet, and an army of politicos and bloggers to comb the fine print for telling arcana, each round of polling numbers is now accompanied by a chorus of commentary as to what those new numbers mean.
Case in point: recent dueling polls by the two gold standards of measuring public opinion, Gallup and the Pew Center for Public Research, diverged widely. Gallup gave Bush a 13-point advantage. The Pews survey the same day had Bush with a 1-point lead (a more recent CNN/USA Today/Gallup Poll of likely voters gave Bush an 8-point lead). Left-wing blogs quickly identified problems with the Gallup Poll. Right-wingers had their own criticism of the Pews information. The Kerry campaign pointed to the Pew and told the photographers to start getting ready for a photo finish. Bushs people tried to ride the Gallup to a multi-length lead.
As usual, the Bush camp stretched the truth much further, but the reason polls have succumbed to so much spin on both sides is that they are, unfortunately, easily spinnable. Polls are imprecise. The process is susceptible to bias, both statistical and political. Results can be shaped just as easily by methodology than by actual changes in opinion. Even the most carefully constructed poll yields only a range of possibility; and when the election is close, as it was in 2000 and will be again in 2004, Bushs and Kerrys respective ranges overlap making all the soothsaying surrounding polls somewhat moot. The question is not what polls mean, but whether they have any meaning at all.
To be fair to Carl Friedrich Gauss, Disraelis remarks were only half-true. When carefully applied, statistics is fairly reliable. Without the discipline, virtually none of modern science would be possible. Its in the realm of interpretation and faulty application that statistics gets its taint of doubt.
Heres how statistical theory works in polls. (Numerophobes: Fear not! There will be no equations or tests, and the background, I promise, is easy to follow.) Out there in the voting population is some proportion of people who at this moment are thinking of voting for Bush or Kerry. Thanks to the insights of Gauss and the magic of a mathematical principle called Law of Large Numbers, we know that a decent-sized cross section of those people can provide a rough estimate of the voting plans for an entire population. The bigger the cross section, the better the estimate. This is the tried-and-true discipline of sampling, and incredibly it means that a thousand people or so can speak for a hundred million.
Or can they? For the theory to work in practice (always the kicker with theories), the sample has to represent the larger population it is intended to describe. If a poll has more Republicans than Democrats, or reaches an unusual number of retired people, college students, African-Americans or any other subdivision within the electorate that has a specific voting tendency then the answer will be prejudiced. This is called sampling error, and it is a fact of statistical measurement.
Thats the difficulty, explains Lynn Vavreck, a professor of political science at UCLA who specializes in quantitative methods. Polls, which rely on calling people and getting them to answer questions, have many sampling pitfalls. The polling firms take a randomized list of 5,000 numbers and start calling in order to get their 1,000 responses. From a scientific standpoint, you need to reach every person on that list to avoid a selection problem. But they throw out cell phones. That might have some effect. And the bigger issue is that not everyone answers. Response rates are getting lower and lower.
In the end, the respondents are the people who picked up the phone and felt like talking. And who those people are could be influenced by all kinds of things, like the days covered by the poll, the wording of the questions, the timing of the calls, a holiday weekend, or whether or not Bush is on television. As in the case of the 10-point lead Bush had in the Newsweek poll right after the RNC, which was conducted partly during the convention and may have gotten more Republicans who were watching at home and eager to talk politics.