Misconception

The RPI is the major determinant in getting a team into the NCAA Tournament. The NCAA committee places an incredible emphasis on this measure, making it probably the single most important parameter to every NCAA team. Beyond that, the RPI is a valid measure of team strength, and therefore it is appropriate to seed teams according to their RPI rank.

The Facts

NCAA EmblemThe Ratings Percentage Index (RPI) was created in 1981 by the NCAA Men's Basketball Committee to be used as a supplementary tool to help in deciding which teams make the tournament and where they are seeded. While the exact ratings formula is not publicly published, results were sent to NCAA schools beginning in 1992 after the season was complete. This practice no doubt prompted NCAA schools to pay more heed to this rating and resulted in a heightened emphasis on the RPI, both by the schools and the media.

More recently, the media and others (notably CBS, Jerry Palm, Jim Sukup and the Collegiate Basketball News) have highlighted the RPI, if not profiting from, at least linking the RPI to the selections the NCAA committee makes as to which teams make the NCAA Tournament and where they are seeded. This seems to have encouraged people to use the RPI for ranking teams against each other, which is neither the intentioned purpose nor the actual use of the RPI.

Rebuttal

What Exactly Does the RPI Measure ?

The RPI is based on winning percentage. While there are some "secret" modifications made on top of the rating, the core of the rating consists of the following:

Parameter% of
Rating
Team Winning %25 %
Opponent's
Winning %
50 %
Opponent's
Opponent's
Winning %
25 %

It cannot be overemphasized that 75% of the rating comes not from anything the particular team in question actually accomplished on the floor, but how its opponents fared. Understanding that point is an important step in obtaining a clear understanding the RPI.

Why is the RPI a poor Model for Team Strength

The major reason the RPI is a poor model for determining team strength is because it is too simplistic to reliably differentiate teams, and relies completely on the assumption that winning % is a valid indicator of how strong a team is. Certainly there is expected to be a strong correlation between winning games and how good the team is, however when you attempt to rank actual teams against each other, the correlation is not as strong as one would think given the wide range in talent levels found within the Division I. If every school played the same schedule (meaning in practicality that they at least played every other school in Division I on a neutral court), then relying on winning % would prove much more valid. But the reality is that schedules are not the same and consist of playing games against teams with wide degrees of talent level. Perhaps most importantly, only a subset of the teams a school is being ranked against are actually met on the court. This exposes the critical flaw in the system. The fact that the rating is so simplistic prevents the rating from overcoming these basic problems.

A Fuzzy Core

There are generally five or six conferences in Division I which are considered power conferences and whose members can, in general, be expected to be one of the top 150 teams in the country, even the teams which finish in last place. Compare this to a low level conference where none of the teams are considered within the top 150 teams in the country. Obviously, one or two teams from that conference will do very well, and in the process accumulate a respectable won-loss record along the way, most entirely against what is poor competition (in the context of Division I as a whole.). Comparing only the won-loss percentage of the last place team of a power conference with the won-loss percentage of a low-level conference champion, with no regard for the schedule each school actually played, one would be completely misled as to which team was the stronger. This type of scenario plays out not just in the above example, but between the many levels of Division I. So in actuality, using won-lost percentage is oftentimes not a clear-cut measure of how strong a team is. That is why I consider it a 'fuzzy' measure of team strength as opposed to other systems which I would consider more direct AND accurate.

Now granted, the RPI is not as poor as the above scenario. In its own way, the rating does adjust for the schedule. It does this by the remaining portion of the rating which comes from the opponent's winning % (50%) and the opponent's opponent's winning % (25%). You could imagine this as the 25% of the rating, which is based on the team's winning percentage, as being attenuated (corrected) by the 50% of the rating which is based on the opponent's record. So that, for example, if a team has won all their games, but it has come against opponents with terrible records, the team's high 25% portion of the RPI will be decreased by the 50% of the opponent's record to reflect that the 25% may not be an accurate gauge of its actual strength. But again, this 50% portion is based not on an actual measure of strength, but on the very same won-loss principle as before. So in effect, the 'fuzzy' 25% measure is being attenuated by the 50% 'fuzzy' measure. Of course the RPI goes one step further, by adding another 25% part based on opponent's opponent's won-loss percentage. So you could imagine that this is an attempt to attenuate the shortcomings of the 50% portion. However, this final 25% is still 'fuzzy'. To make matters worse, one could ask what is attenuating this last 25% to try to make sure it is a more accurate indicator of the opponent's opponent's team strength ? The answer is nothing at all.

The Devil's in the Details

The second blow to the RPI system comes from the fact that it disregards many pieces of data which would allow it to firm up the basic results in order to provide definition to where teams rank next to each other. Obvious and chief among these pieces of data is the actual scoring margin between teams. To beat a team by 20 points rather than 2 is some strong evidence that the strength of your team is much better than the strength of your opponent. There are other important factors such as where the game was played, whether teams are on extended winning or losing streaks, how teams do in close games etc. which are simply not even considered in the RPI model. There are plenty of models, explicitly designed to measure a team's strength, such as Sagarin's, Massey's etc. which utilize this type of information and more. In comparison, the RPI is not in the same league and cannot be considered a serious model.

Another important result from this difference in level of in detail between the RPI and competing models is the ability to gauge an opponent's strength. Measuring this factor accurately is key to the validity of the final result. Obviously, beating an opponent by 10 points can only be really useful information if you can accurately gauge how strong the opponent actually was. The fact that these other (non-RPI) models explicitly try to measure each team's strength allows them to utilize this information on their opponents as the measure of how strong their opponent's are. The RPI, in comparison, is muddled in the scenario described above where the measure of strength of their opponent is not even the opponent's full RPI, but an unattenuated partial RPI.


An Analogy

If it is still not clear that the RPI is an inferior model of team strength, please consider the following simple analogy. It attempts to illuminate the basic differences between the RPI and the other's systems.


Shifting Sands

Of course in the above scenario, Rip's method is similar to the RPI rating method, while Jeff's method is probably the absolute basic starting point for other models explicitly designed to measure team strength (ie. Jeff Sagarin, Massey Ratings etc.)

Some may ask; why then aren't these other models perfect indicators of team strength from the get-go ? The answer to that is 1.) there is no direct way to measure something ambiguous such as team strength (as compared to being able to measure the height of a tree) 2.) the actual team strength changes as the season progresses and 3.) there are other factors which come into play in college basketball which often allows weaker teams to win games against stronger competition on a given night. Below are more details about each.

1.) In order to assess something like team strength, there are literally hundreds of criteria which could potentially be used. How many points a team scores, the strength of each opponent and how well they did against them, whether they win or not, how many points they win by, how strong their defense is, whether they play more than their share of games at home, on the road or on neutral sites etc. are obvious ones (and most of which incidentally aren't even considered by the RPI). Some criteria are stronger than others and thus deserve more weight, while others are less so and often end up muddling the result as it may provide contradictory information. Sorting through what criteria is important, what types of variances are inherent with each measure and how they should be factored in are the domain of the various models and are what differentiates them from each other. Most of the details are proprietary. So although the different systems will produce different results (based on how the model is set-up), most every one should be more accurate than the RPI, which is hopelessly simplistic.

2.) Another key point is that team strength changes from day to day and week to week during the season. Players gain more confidence, they learn the system and become better at executing it. Other teams may have internal strife or lose confidence in themselves or their coach. Winning streaks and losing streaks can do wonders to the psyche of a team. Other important events may be the reinstatement of a key player or an injury.

All of these events have a definite impact on the actual team strength although accounting for these types of events are often the weak point of many mathematical ratings systems. Many simply aren't designed to recognize, for example, when a key player is injured and to be able to reassess team strength based on that. Even the ones that do may not be able to accurately determine how well the player who replaces the injured one will do or how well his teammate will take up the slack. Most mathematical models (the RPI included) are completely reliant on the event eventually impacting the results (ie winning % in the case of the RPI, margin of victory against a particular opponent in the case of Sagarin). The problem with this is that often these types of events may happen during the season and not enough games are played after the event to appreciably affect the ratings, especially ratings which don't differentiate between recent games versus games played early in the season.

This type of situation reveals another important point, and that is that it is foolish to rely completely on computer rankings. Weekly polls, as maligned as they are, are actually much more adaptive to compensating for these types of events and thus are (and should be) important factors in assessing how strong a team is compared to others (and therefore directly relevant as one tool for seeding of all teams by the NCAA Committee). Another strength of polls is that they can often easily dismiss teams which may look great on the computer screen, but have obvious deficiencies when seen in person.

3.) Even given the above factors which can confuse mathematical models, there is a final reason which can completely play havoc with them. That is that there are many unknown and unmeasurable factors which can cause a game's outcome to be opposite to what might be predicted on paper. These factors can include anything from what a player had for dinner, a bad night's sleep, overconfidence, lack of focus due to outside influences, poor officiating, fouling out, a hot 3-point shooter, a lucky bounce etc. The list is endless and college basketball is not immune (particularly with respect to other sports like football) where there are far few players and thus a factor affecting one of them can have a much greater impact on the game's outcome. No computer model can (or should) be expected to account for these types of factors.

The above demonstrates that there are countless reason's why even a perfect model cannot be expected to correctly pick the best team on any given night. The important point, in terms of this paper, is that the factors which adversely affect the Sagarin's etc. also adversely affect the RPI. So even if these other model's aren't perfect, they're still ahead of the RPI.

Why Should the NCAA Scrap (or Deemphasize) the RPI

There are five major reasons why the RPI needs to be reevaluated by the NCAA, and either completely overhauled or scrapped altogether. These are listed below.

The first reason deals with the basic flaws in the model itself [described in detail above] which affect its ability to reliably compute what it was intended to compute (ie a team's schedule strength and how the teams fares against that schedule). There are certainly other models out there [and most certainly created by someone with a better understanding of mathematics and statistics than whoever dreamed up the RPI] which could do this more reliably.

I've Got a Secret

An interesting, and in a way disappointing, recent development has been the committee's use of a 'secret' adjustment. Most likely in response to the fact that many of the RPI results defy logic to those expecting the RPI to be a measure of team strength. Although the details are not known exactly, this adjustment is most certainly an attempt to correct for the wide levels in talent found between conferences within Division I alone. Some may claim this makes the RPI more accurate, and it probably does. The fact that this type of tacked-on correction is necessary at all suggests to me that the basic model is fundamentally flawed.

This 'secret' adjustment is noted by some RPI apologists as a counter to the critics. They note that although the freely available RPI lists (such as Jerry Palm's list which is not modified from the basic model) don't seem to make any sense with reality, that those aren't the 'real' RPI numbers, which the NCAA computes. However since this 'real' list isn't made available and the formula is 'secret' there's no way on can argue against it. It's a perfect shell game.

Except for one thing. The actual RPI numbers as computed by the NCAA are released to the schools after the tournament. I personally haven't seen such a list and compared it to the unaltered lists such as computed by Palm, however Palm has and he has told me that he hasn't seen significant differences between the two. So whatever 'secret' adjustments which are being done, aren't significantly altering what is in reality a very poor and fundamentally flawed model.


The second major strike against the RPI is that these weaknesses within the model itself allow for potential manipulation by the major schools to attain a relatively high RPI without really playing as difficult a schedule as the RPI would imply. The flaw of the lack of specificity between schools with good records in major conferences vs. schools with good records from mid-major to poor conference can be exploited, as well as the lack of details in the RPI model which don't account for factors like home games versus away games etc. can be exploited.

The Golden Rules

Below is an actual example of how a team can form a schedule, based on a few principals designed to exploit the RPI's weaknesses, so that they can boost their RPI rating. This is excerpted from a post by Phil Beineke on rec.sport.basketball.college, December 12, 1999 titled "How to Beat the RPI".


The third reason why the NCAA should rethink the RPI is that it is unfairly skewed toward the top conferences. Not that the top conferences aren't deserving of the majority of at-large bids, only that even after a stellar regular season, unless a mid-major team wins their conference tournament, they are virtually locked out of gaining an NCAA bid. Dan Wetzel, in an article ("RPI continues to lack an ounce of reason", CBS Sportsline, Jan. 9, 2002) argues that the RPI formula artificially props up teams in major conferences while mid-majors steadily see their RPI rating erode due to the overall lack of winning % of the conference they happen to compete in. He also argues that because the top conferences are in the majority in the NCAA councils which determine these things, there is no incentive to do away with the RPI because it goes against their best financial interests. He writes:

Wetzel goes on...

Another point made in the article is that even for mid-major teams with quality teams and quality out-of-conference wins (ie. Gonzaga, Butler, Kent State), the drag on their schedule rating by their conference not only makes it difficult to get into the tournament, even if they do make the dance, their seeding will be adversely affected. That realistically limits the revenue they will generate since they must upset tougher teams than they potentially should have to face in the first round in order to advance.

Beyond, that the converse to point #2 above is also true. That is while it is possible for teams from the top conferences to manipulate their RPI rating by clever scheduling ploys, that luxury isn't always available to their mid-major counterparts. In fact, lack of scheduling options can often hurt a mid-major's rating. Another quote from Wetzel's article:


The fourth reason why the NCAA should rethink the whole RPI idea is that apparently the people who should know the most about it seem to know the least. The media is hopelessly confused, continually suggesting the RPI is a measure of team strength when it's not. Dick Jerardi of the Philadelphia News (after he saw the light) gives some insight into why the majority of the media might hold on to the RPI idea.

Even the coaches and college administrators, the audience to which the RPI is supposed to spur to schedule more difficult schedules, seem to not have a clue.

Confusion Abounds


The fifth reason why the NCAA should drop the RPI is it is becoming apparent, even to some of the more clueless journalists that the NCAA is in fact not being completely honest in the criteria they use.

The Shell Game

In the March 3rd, 2003 issue of Sports Illustrated ("A Madness to the Method ?"), Alexander Wolff wrote an article discussing the RPI. He did a good job of gaining access to the major players, Jim Sukup, Jerry Palm, Jeff Sagarin and Gary Johnson, the senior assistant director of statistics at the NCAA who was in charge of maintaining and tweaking the 'official' RPI. In the article, despite much confusion on the part of Wolff, some very telling nuggets of information still became known.

The most telling were the revelations that since 1984, the committee has requested Sagarin's ratings for their deliberations. Beyond that, according to Sagarin a committee member once confided with him that the committee does indeed use and appreciates his rankings. This certainly is understandable since the Sagarin ratings are well-respected and are specifically designed to reflect team strength, exactly what is the most useful measure for determining the best team among prospective at-large squads.

According to Sagarin's encounter, the committee member added that although the committee like his ratings, "we can't say that" The reason being that Sagarin's ratings take into margin of victory. Wolff explained earlier in his article that the NCAA is adverse to anything which seems to reward margin victory, as it encourages coaches to run up the score and it smacks of the type of point spreads which are often associated with gambling, an influence which has been and remains to be a threat to college basketball and sports in general.

From the above, it appears that the NCAA committee is indeed using more sophisticated and accurate measures of team strength and schedule strength than the RPI, they're just not telling anyone. For anyone who has compared the actual teams the NCAA has invited to the tournament along with where they've been seeded, this revelation shouldn't be surprising at all. In fact the RPI has consistently fared poorly in this measure, indicating that other factors besides the RPI are playing important roles in the committee's decisions.

As a postscript to the article in Sports Illustrated, Wolff wrote a follow-up article ("Replacing the RPI," SI.com March 1, 2003) where he expressed even more misgivings about the RPI than he did in the magazine and finally began to see the issue much more clearly. Wolff wrote:


The fact remains that the RPI has taken a life of its own and its actual purpose and use by the committee has been lost in the hype. Unfortunately, the NCAA has never been an organization which has been good at admitting mistakes. Instead of admitting these inherent problems or scrapping the system outright (in favor of a more mathematically sound model of schedule strength), they seem to avoid discussing the RPI in much detail and explaining both to the media and to the coaches (who the RPI is supposed to influence) what exactly the RPI is meant to measure and how it's used.

Now That I'm Confused, what is the RPI exactly again ?

The following is all my opinion, take it for what it's worth. The RPI was created by the NCAA Tournament Committee to help differentiate teams when determining what bubble teams made the final cut and sometimes to choose between two otherwise equal teams for a particular seed. It is one of many criteria at the committee's disposal, no more, no less and not even one of the major criteria IMO. The RPI itself is more a measure of a team's schedule strength (ie it tries to rank teams by how difficult their schedule was) with a good portion added for the team doing well and for the team not scheduling obvious inferior teams. I believe the reason the NCAA began to publicize the information was an attempt to persuade teams to start scheduling tougher teams than they might have in the past. The RPI was simply the NCAA's way of telling bubble teams that they need to toughen up their schedule.


From the Horse's Mouth

Below is a transcript of the NCAA committee's use of the RPI by the chairman at the time. (NCAA News, February 8, 1995) This comes from Jerry Palm's College RPI site and I believe supports some of my thoughts above.

A more recent committee chairman, C.M. Newton is also quoted as saying that the RPI is 'one of many' criteria used by the committee. (JPS Note - I would appreciate if someone could supply me with the exact reference.)

And more recently, the 2001 NCAA Tournament Committee Chairman, Big East commissioner Mike Tranghese, was questioned about some of the things which are important to the committee. He mentioned as very important the Coaches Advisory Poll, which is a poll put together at the end of the season where coaches vote on and rank teams only in their own region. When asked about the RPI by the Sporting News, this is what he said.

- from "March turns mad for Big East commissioner Tranghese", by Jeff D'Alessio, The Sporting News, March 1, 2001.


The Truth is Staring You in the Face

I briefly alluded to (earlier in the page) various web sites and services which have helped push and publicize the RPI and am critical of them for the way they tend to intermingle the RPI with what the committee does and where teams are seeded (something I find completely overblown). However, it is interesting to note that despite this, some of these sites have been very candid about what the RPI actually is. Jerry Palm of College RPI, for example, has clearly stated on his RPI FAQ that,

Beyond that, Mr. Palm has mentioned to me,

With the number of college basketball fans who closely follow the RPI rankings and use the above (and other) sites, it would seem that some of them would actually take the time to read what the RPI actually is. For whatever reason, many basketball fans don't. (P.S., I'd like to take the opportunity to thank Mr. Palm for providing information about the RPI and historical rankings on his site.)


Where the Rubber Meets the Road

To see how all this all works, it is probably instructive to run through some real-life examples comparing RPI results with actual NCAA fields. Below are links to comparisons of the actual NCAA fields for the 2000-01 and 2001-02 seasons with various rating models and polls, including the RPI.

Take for example the 2000-01 results where the actual seeds as determined by the NCAA committee were correlated with various ratings (such as the RPI, Sagarin, Massey) and polls (such as AP etc.) to see which measure was best. In this example, the polls did a superior job at correlating with the NCAA than any mathematical model, including the RPI. In fact, the RPI was the worst predictor of where teams should be seeded of any of the other models.

This evidence flies directly counter to the prevailing 'wisdom' spouted by the media outlets suggesting the importance of the RPI on NCAA seeds, yet it shouldn't be surprising to those who have read this page. Afterall, of all the models, the RPI is the one which isn't really even a measure of team strength, but instead is more a measure of schedule strength. So the real question is why should the RPI correlate well with the NCAA seeds ? The answer is that although there is some commonality which will help the RPI correlate, it really doesn't have to.

(As an aside, the main conclusion from these correlations is that no one individual measure is the best at correlating with the NCAA seeding decisions. In fact, it is far better to take up all the quality ratings and polls and taking an average, since that average turns out to produce a far better correlation.)

In the example of the 2001-02 season, the RPI fares better (or perhaps more correctly, the competing models fare worse) than the previous year. When looking at the top-6 seeds (basically the top 25), the RPI was once again the worst predictor (at least among the mathematical models, the ESPN/USA Today was the worst overall). However, when looking at the top 12 seeds (basically the major conference champions along with at-large teams that year), the RPI improves so much that it jumps over the Massey rating in terms of correlating to the NCAA field of top 12 seeds. Looking back at the 2000-01 results, one can see that the RPI too made a healthy gain in its correlation when going from the top-6 seeds to the top-11 seeds (at least in comparison to the other models) although this still wasn't enough to pull it out of last place in 2001.

Why might this be happening ? The RPI's correlation is getting better in going from the top 6 seeds to the 'at-large' seeds much more efficiently than competing models (such as Sagarin, Massey) are converging, that it appears the RPI does exert some appreciable influence with the NCAA committee when it comes to this particular set of teams (ie bubble teams). Again, for someone who has read this page, this shouldn't be all that surprising since it is outlined that the NCAA committee does indeed use the RPI somewhat as a tool, especially when considering what bubble teams deserve an at-large bid.

So the take-home from these examples is that when someone talks about the RPI being an important determinant of seeding, they need to qualify themselves. First, it is far better to take into account the aggregate of a team (ie where they are ranked in various polls, what their Sagarin, Massey, RPI ranking etc. are) than to base their expectation on any single measure (including the RPI). Beyond that, the RPI really isn't very important for at least the top 6 seeds (basically the top 25 teams in the nation) and is in fact likely the poorest predictor of seedings for those teams. While it does have some measurable influence in the latter seeds, it's still not much better (if still not worse) than competing models. So in reality only fans of bubble teams should really concern themselves with their team's RPI ranking, and even then it's far from the crucial measure the media sometimes makes it out to be.


Discrepancy, or Just Common Sense ?

To see whether you were paying attention, consider the following scenarios.


The second example of an apparent discrepancy between perception and reality of the RPI is illustrated below.

If anyone has any comments or suggestions, please mail me. I'd particularly be interested in any direct quotes from NCAA Committee members regarding how the process works and how much weight they actually give to the different criteria. I'd also be interested in any statistical analysis comparing RPI results with other computer rankings, and how this correlates with things like polls, tournament success etc. I did an analysis of the 2001 NCAA Tournament which you might be interested in.

Return to Kentucky Wildcat Basketball Page.

Please send comments to Jon Scott