2009-10 NCAA Seeding Correlation

2009-10 NCAA Tournament Seedings & Results

The table below shows the predicted rank of the top 47 teams (12 seeds not including New Mexico State) which were selected to the NCAA Tournament committee. The purpose is to compare various polls and ratings systems to see which method is most accurate in predicting the seeds the NCAA committee determined. Each rating system (RPI, Sagarin, Pomeroy & Massey Rating Systems along with the Associated Press sportswriters and ESPN/USA Today Coaches poll) was taken after the conference tournaments were completed (prior to the beginning of the NCAA Tournament) and the rank of each team was assigned to the teams which made the tournament. This data was then regressed against the average ranking a team of that particular seed would be expected to earn. (except for the #1 seeds where the committee did break down the specific order).

So for example, a number two seed would be expected to be one of the top five to eight teams in the tournament. They thus would have an average ranking of (5 + 6 + 7 + 8)/4 = 26/4 = 6.5. This approximation is used because it is generally unknown how the NCAA committee ranked teams within each seed. (and as illustrated by comments by the committee chairman in 2010, the S-curve is not faithfully followed and thus cannot be relied upon.)

The data was regressed and a measure of the error between the predicted and actual rank was determined by using the r-squared function (rsq). For those unfamiliar with this function, rsq can vary between 0 and 1.0, with the higher value indicating a better fit. Because each team is being regressed against an average ranking (and thus there are natural errors introduced), the very best regression that can be achieved is not an r-squared of 1.0 (which corresponds to zero error) but a lower value, shown at the bottom of the table. Because of this, the r-square of each model was normalized to that reduced value to account for the artificially introduced error.

[Note that it has been suggested that using the Pearson Correlation function ('CORREL' in Excel) may be more appropriate for this type of analysis. Using this gives nearly exactly the same results as what I found for the normalized r-squared analysis so I decided to keep the r-squared, mainly because more people are familiar with this and r-squared has been used in previous years so it's easier to compare results between years. If anyone has a better suggestion for how to correlate this data, please let me know at the address at the bottom of the this page.]

The plot below shows the difference between the predicted and actual seedings. A perfect fit would be a straight line through the center of the graph from the bottom left to the top right.

Preliminary Conclusions

There are other aspects of these results which can be considered, namely how well the models predict teams that were invited to the field , and whether the models accurately predicted teams that didn't get invited by the NCAA committee not making the final cut.

In the first case, the following set of tables look at the top-50 teams that each model predicted (the top 47 in this case made the at-large field) and note the teams that were predicted to be in the field but weren't invited by the committee.

Conclusions: The polls do not typically reach into the bubble teams so their results are not really meaningful, other than to note that there were cases (3 in the AP poll, once in the ESPN poll) where teams received votes but were not placed in the at-large field.

In terms of the mathematical models, Sagarin's model did the best in terms of not including teams which did not ultimately make the field. The only miss, and one which all the other models also missed on (save the RPI) was Virginia Tech. In terms of strength, Pomeroy, Massey and Sagarin all had Virginia Tech solidly in the field. The RPI, which is more a quasi-strength of schedule rating than a traditional power rating (see this link for more information), listed Virginia Tech outside the field at #59. This is in accordance with the thought by pundits that Virginia Tech was passed over due to their poor schedule strength, particularly out of conference.

In the next case, teams that did make the field, but didn't show up within a model's top 47 teams are highlighted.

Conclusions: Again, in this case, the poll results are typically not relevant since they generally don't reach near the bubble teams, much less beyond.

In terms of the mathematical models, again the best model was Sagarin, who had Florida just out of the at-large field at #48. The NCAA Committee had Florida solidly in the tournament as a #10 seed, although that was questioned by pundits and when compared to other bubble teams (such as Mississippi State), it's not clear that Florida was demonstrably better than these other teams on the bubble.

The model that showed the most variation was the RPI. Again, this is not surprising given that the RPI really isn't a traditional power ranking like the other models. The RPI was originally designed around, and thus is more susceptible to the influence of schedule, rather than necessarily how strong the team actually is. (see the following page for more information on this.)

One Last Comparison

One final comparison is included as a way to bring home the power of averaging the above disparate models. One favorite pastime of some is to look at "Bracketology" and try to determine who makes the field and who doesn't. There's even been a cottage industry of 'bracketologists' who claim to be experts in the topic.

The below table looks at the predicted field of at-large teams against the field chosen by the NCAA committee, based on averaging the above results and reordering the teams accordingly. Over a dozen additional teams were included, teams which were on or near the bubble, in order to see if they might have been overlooked.

Below are the results from reorganizing the teams based on the average rank.

What is found is that the results of who made the field and who didn't are nearly perfect with the NCAA-chosen field, right to the cut-off of 47 teams. (teams left out of the tournament are shown in purple) The only 'mistake' is Virginia Tech, which was predicted to be in the field as an 11th-seed while Florida was predicted to be just outside of the field. These just happened to be two of the most questioned choices by the pundits to the NCAA committee.

In other words, the above analysis gave just as good a prediction of who made the field and who was left out as most of the professional bracketologists. For example, Joe Lunardi also correctly picked the field with one exception. In his case, he included Illinois as a #12 seed at the expense of Florida.

Last Updated May 14, 2010

Return to Kentucky Wildcat Basketball Page or RPI Page.

Please send all additions/corrections to