(Inside Science Currents Blog) -- It's NCAA Men's Basketball Tournament time and fans have lots of questions. Will Kentucky win it all and finish the season undefeated? Will one of the "First Four" – the teams that begin tournament play on Tuesday night – become a Cinderella story during this year's March Madness games? What's the best selection method for a pool-winning bracket?
Well, the tournament comes just once a year. If you were betting on a computer simulation of the tournament that was run 1,000 times, then the probabilities would bear out, and the strongest method for making predictions would have an advantage. But the tournament is a series of isolated events. The one-game, winner-moves-on structure is a big reason why even top statisticians don't know if they would make a better prediction for one year's tournament than someone who knows nothing about basketball.
NCAA Tourney Meets Statisticians
You can, however, learn something from trying to improve your predictions. In fact, the current issue of the Journal of Quantitative Analysis in Sports is made up of research papers that describe different methods to predict the outcome of the tournament.
All but one of the papers were written by groups that entered a competition during last year's tournament, which was sponsored by Intel and held on a website called Kaggle. The website often holds prediction-centered contests, and this one required that all 248 entrants compute and submit the probabilities for each of the more than 2,000 possible games that could occur in the tournament. The winner was determined by comparing what actually happened in the 2014 NCAA tournament -- all 63 actual games -- to each group's set of probabilities. The set of probabilities submitted by the winning entry in the Kaggle competition was the closest to the actual results.
Andrew Hoegh and Marcos Carzolio, both graduate students in statistics at Virginia Tech in Blacksburg, were part of a group that entered the Kaggle contest and then wrote a research paper outlining their method for the contest. Basically, they ranked the teams and then added a layer for what they called matchup effects.
For the ranking stage, they looked at the teams' season-long performances to determine the general strength of each team. From there they were able to determine an initial prediction of what would happen when Team A played Team B. Who would win and by how many points? Then they looked at the attributes of each team -- which teams played fast or slow, had players that were tall or short, or shot 3-point shots well, among other variables. Then they compared the teams, to develop clusters. Once they plotted out the clusters, they could compare how Team A performed against teams similar to Team B, and compute what they called a residual. That's the difference due to the matchup effects for those particular two teams.
"The idea is that you have an initial model and it gives predictions, and then you get residuals from those predictions," said Carzolio. "And then for teams that are alike, we make corrections based on how off our predictions were originally."
Carzolio said that before entering the competition, he thought that TV analysts were often off-base in their analysis of why a certain team would over- or under-perform against another. But, the project influenced his view.
"There are actually matchup effects where different teams that have different strengths will match up well against other teams that have weaknesses in those categories. That's pretty interesting," said Carzolio.
His group finished ahead of about 75 percent of the entries in the Kaggle competition. In a different research paper, the pair that finished first made an effort to point out how much luck was involved in their victory.
The Luck of the Draw
"We estimate that under the most optimistic of game probability scenarios, our entry had roughly a 12 percent chance of outscoring all competing submissions and just less than a 50 percent chance of finishing with one of the ten best scores," wrote Michael Lopez and Gregory Matthews, in their paper in the Journal of Quantitative Analysis in Sports. The authors are both professors, at Skidmore College in Saratoga Springs, New York, and Loyola University Chicago, respectively.
Their method combined Las Vegas point spreads with metrics that describe average team performance per trip down the court.
But some other new research contends that if you really want to win your pool, you might have a secret weapon in your pocket.
"For all the hype, research and time taken to make the oh-so-careful selections, there's scant evidence that knowledge of the game makes any difference at all in bracket performance. 'A grandmother who's never seen a game has a similar chance of doing as well as her grandson who spends eight hours a day watching and researching basketball,' Kwak said."
So, develop a mathematical model to look at the fine details of matchups, flip a coin, or invent your own method. Whatever you do, make sure you make your picks before the tournament's first full day of games this Thursday.