By Lakshmi Chandrasekaran
In the end, it came down to will she or won’t she?
What seemed like a comfortable 81 percent chance of winning the election for Hillary Clinton just a couple of weeks ago, morphed into a tight race to the finish after FBI director James Comey announced a new investigation of a new round of Clinton’s emails. But she was still expected to win the presidency – until the reality of the red wall of electoral votes gave the victory to Donald Trump.
Clearly predicting outcomes of complex events such as election winners using data and statistical calculations is what statistician and journalist Nate Silver does on a regular basis through his news website fivethirtyeight.com. He and his team collate mountains of live polling data and crunch these numbers via different statistical and mathematical models to arrive at these predictions. Trump was stronger wherever the economy was weaker, with slower job growth and lower wages. And he outperformed Clinton in all those counties, it states on Silver’s website as part of the election analysis.
Silver, a statistician by training, started his career analyzing numbers for baseball. But he catapulted into national fame for successfully hitting the bulls-eye, predicting who the president would be in the 2008 and 2012 elections. His 2012 book ‘The Signal and the Noise’, on the art of predicting, hit the New York Times bestseller list. Last month, as part of the ‘One Book One Northwestern’ program, he talked about his book at the Northwestern University.
Here’s what he said, though we clearly need a new book now. “Prediction is intrinsic to the scientific method, where we are all kind of flailing around and trying to figure out what the truth really is, what’s subjective and what’s not,” said Silver in an interview before his talk. “So to me, a prediction is central to the process of gaining knowledge,” he added. But it still relies on theories and abstract models.
In his talk to a packed auditorium at Northwestern University, Silver noted a shocking statistic – that 90 percent of the data available in the world was recorded in the past couple of years! And we are now grappling with the issues of how to analyze these vast volumes of “big data.”
“As far as forecasting complex events such as earthquakes and terrorism, people have made very little progress since it’s never as easy as a push button solution. One of the biggest challenges of big data is that there is more room for interpretation and errors,” he said.
Taking the case of the election predictions, “newspapers such as the New York Times, Chicago Tribune etc. have been making a lot of predictions as to how the presidential race is shaping up and their implications,” said Silver. He continued, “Our role is not to say we can predict everything perfectly but instead to be able to say when things are more or less predictable and what are realistic scenarios versus plausible and unrealistic versus really impossible scenarios.”
Silver emphasized that fivethirtyeight.com has been more on the cautious side, predicting that Hillary Clinton had an 80 percent or even 75 percent chance winning as opposed to other polls. However, he mentioned that it is not always easy to make successful predictions because ‘uncertainty’ gets in its way.
And what is uncertainty? Let us take the example of the 2016 elections, which has a lot more ambiguity compared to the previous elections. Silver said, “Younger voters are a major source of uncertainty. The way it shows up is a higher number of undecided voters. We are not going to have many millennials who vote for Donald Trump, but they could vote for Jill Stein or Gary Johnson or decide not to vote.” Silver warned that if the Bernie Sanders voters from the primaries did not go out to vote on the day of the election, then Clinton would be at risk.
In fact, the millennial vote split at 55 percent for Clinton, 37 percent for Trump and rest – 8 percent – voted for a variety of candidates.
Since election polls are all based on a certain sample size, one of the other major components affecting the ability to correctly predict elections is calculating the margin of error, said Silver. In statistics, the error margin calculation enables us to discern the “accuracy” and in turn, assess the uncertainty of prediction.
Drawing on weather forecasting and disasters in his book, Silver said, “In April 1997, the Red River flooded a town in North Dakota as the state had faced heavy snowfall that year. Although it didn’t result in any loss of life, it resulted in the evacuation of thousands of residents with clean up costs running into billions. This damage could have been mitigated.” Yes, the National Weather Service had predicted that this flood would crest at an all-time high of 49 feet. What they missed in this prediction was the “uncertainty.”
“It turns out, in this case, the margin of error calculation resulted in +9 to -9 feet,” said Silver, meaning the river could have crested anywhere between 40 and 58 feet. As a result, the prediction was much too “perfect.” The levees were built to withstand a maximum of 51 feet, but that year the water crested at 54 feet, defying the Weather Service’s precise prediction.
Silver plans to predict much more than sports and elections on fivethirtyeight.com. “We’d like to look at areas of criminal justice for example, which is a case where, for a long time, you didn’t have very good data. There’s pressure now to get more data,” he said referring to some underrepresented areas of prediction. Education, public policy, urban planning etc. are other areas, which Silver said he would like to explore.
Talking about his inspiration to create the website, Silver said, “It was partly due to my frustration with the way elections were covered by media.” And he cautioned about the dangers of publishing breaking news too quickly in today’s expeditious world. “It’s important to understand what’s going on first before hyping stories. The quick turn-around time in a journalistic sphere is misguided,” he said, adding that it is critical to maintain objectivity while reporting news.
Silver highlighted, without uncertainty, the need for more STEM graduates in the U.S. who would be trained in quantitative skills needed to promote the future of data journalism. “Young journalists need to understand the importance of data visualization, which is another valuable skill to have,” he said.
Silver also stressed the importance of having people trained in cross-disciplinary fields. “I think we will see that happening but we can’t solve a big human capital problem overnight. It will happen over a generation or half a generation,” he said.