Misconception
The RPI is the major determinant in getting a team into the NCAA Tournament. The NCAA committee places an incredible emphasis on this measure, making it probably the single most important parameter to every NCAA team. Beyond that, the RPI is a valid measure of team strength, and therefore it is appropriate to seed teams according to their RPI rank.
The Facts
The Ratings Percentage Index (RPI) was created in 1981 by the NCAA Men's Basketball Committee to be used as a supplementary tool to help in deciding which teams make the tournament and where they are seeded. While the exact ratings formula is not publicly published, results were sent to NCAA schools beginning in 1992 after the season was complete. This practice no doubt prompted NCAA schools to pay more heed to this rating and resulted in a heightened emphasis on the RPI, both by the schools and the media.
More recently, the media and others (notably CBS, Jerry Palm, Jim Sukup and the Collegiate Basketball News) have highlighted the RPI, if not profiting from, at least linking the RPI to the selections the NCAA committee makes as to which teams make the NCAA Tournament and where they are seeded. This seems to have encouraged people to use the RPI for ranking teams against each other, which is neither the intentioned purpose nor the actual use of the RPI.
Rebuttal
What Exactly Does the RPI Measure ?
The RPI is based on winning percentage. While there are some "secret" modifications made on top of the rating, the core of the rating consists of the following:
Parameter | % of Rating |
---|---|
Team Winning % | 25 % |
Opponent's Winning % | 50 % |
Opponent's Opponent'sWinning % | 25 % |
It cannot be overemphasized that 75% of the rating comes not from anything the particular team in question actually accomplished on the floor, but how its opponents fared. Understanding that point is an important step in obtaining a clear understanding the RPI.
Why is the RPI a poor Model for Team Strength
The major reason the RPI is a poor model for determining team strength is because it is too simplistic to reliably differentiate teams, and relies completely on the assumption that winning % is a valid indicator of how strong a team is. Certainly there is expected to be a strong correlation between winning games and how good the team is, however when you attempt to rank actual teams against each other, the correlation is not as strong as one would think given the wide range in talent levels found within the Division I. If every school played the same schedule (meaning in practicality that they at least played every other school in Division I on a neutral court), then relying on winning % would prove much more valid. But the reality is that schedules are not the same and consist of playing games against teams with wide degrees of talent level. Perhaps most importantly, only a subset of the teams a school is being ranked against are actually met on the court. This exposes the critical flaw in the system. The fact that the rating is so simplistic prevents the rating from overcoming these basic problems.
A Fuzzy Core
There are generally five or six conferences in Division I which are considered power conferences and whose members can, in general, be expected to be one of the top 150 teams in the country, even the teams which finish in last place. Compare this to a low level conference where none of the teams are considered within the top 150 teams in the country. Obviously, one or two teams from that conference will do very well, and in the process accumulate a respectable won-loss record along the way, most entirely against what is poor competition (in the context of Division I as a whole.). Comparing only the won-loss percentage of the last place team of a power conference with the won-loss percentage of a low-level conference champion, with no regard for the schedule each school actually played, one would be completely misled as to which team was the stronger. This type of scenario plays out not just in the above example, but between the many levels of Division I. So in actuality, using won-lost percentage is oftentimes not a clear-cut measure of how strong a team is. That is why I consider it a 'fuzzy' measure of team strength as opposed to other systems which I would consider more direct AND accurate.
Now granted, the RPI is not as poor as the above scenario. In its own way, the rating does adjust for the schedule. It does this by the remaining portion of the rating which comes from the opponent's winning % (50%) and the opponent's opponent's winning % (25%). You could imagine this as the 25% of the rating, which is based on the team's winning percentage, as being attenuated (corrected) by the 50% of the rating which is based on the opponent's record. So that, for example, if a team has won all their games, but it has come against opponents with terrible records, the team's high 25% portion of the RPI will be decreased by the 50% of the opponent's record to reflect that the 25% may not be an accurate gauge of its actual strength. But again, this 50% portion is based not on an actual measure of strength, but on the very same won-loss principle as before. So in effect, the 'fuzzy' 25% measure is being attenuated by the 50% 'fuzzy' measure. Of course the RPI goes one step further, by adding another 25% part based on opponent's opponent's won-loss percentage. So you could imagine that this is an attempt to attenuate the shortcomings of the 50% portion. However, this final 25% is still 'fuzzy'. To make matters worse, one could ask what is attenuating this last 25% to try to make sure it is a more accurate indicator of the opponent's opponent's team strength ? The answer is nothing at all.
The Devil's in the Details
The second blow to the RPI system comes from the fact that it disregards many pieces of data which would allow it to firm up the basic results in order to provide definition to where teams rank next to each other. Obvious and chief among these pieces of data is the actual scoring margin between teams. To beat a team by 20 points rather than 2 is some strong evidence that the strength of your team is much better than the strength of your opponent. There are other important factors such as where the game was played, whether teams are on extended winning or losing streaks, how teams do in close games etc. which are simply not even considered in the RPI model. There are plenty of models, explicitly designed to measure a team's strength, such as Sagarin's, Massey's etc. which utilize this type of information and more. In comparison, the RPI is not in the same league and cannot be considered a serious model.
Another important result from this difference in level of in detail between the RPI and competing models is the ability to gauge an opponent's strength. Measuring this factor accurately is key to the validity of the final result. Obviously, beating an opponent by 10 points can only be really useful information if you can accurately gauge how strong the opponent actually was. The fact that these other (non-RPI) models explicitly try to measure each team's strength allows them to utilize this information on their opponents as the measure of how strong their opponent's are. The RPI, in comparison, is muddled in the scenario described above where the measure of strength of their opponent is not even the opponent's full RPI, but an unattenuated partial RPI.
An Analogy
If it is still not clear that the RPI is an inferior model of team strength, please consider the following simple analogy. It attempts to illuminate the basic differences between the RPI and the other's systems.
Two Forest Rangers, Jeff and Rip are taken to a patch of woods and given the task of determining the relative height of every tree compared to the others, with the stipulation that only two trees can be compared to each other at a time. Each tree will be measured at least once but there is an undetermined number of measurements which will be made.
Jeff decides that the most logical way to approach this problem is that when he goes to compare two trees, he measures each tree's actual height and records the data so that not only does he know which is taller between the two, but by how much. As he proceeds comparing trees, it soon becomes apparent that when a particular tree (ex. A) has been compared to multiple other trees (ex. B, C & D), the differences between the tree (A) and the others can be used to assess the relative height differences between trees which haven't directly been compared (ie B to C, B to D, C to D etc.). Soon Jeff has an extremely accurate ranking of all the trees without having to compare every tree to every other one in the wood.
Rip, on the other hand, decides that if a tree is compared to others, the percentage of times it is found taller than the other trees is a good measure of where the tree ranks within the entire woods. Actually measuring the height of the tree is deemed too 'complicated'. He sets about doing just that, however he soon found that he required a much larger number of comparisons than Jeff had to do in order to achieve similar results. The results were also sometimes misleading as it turns out the ranking of a particular tree was heavily dependent not only on how tall the tree was itself, but on the heights of the trees it was compared to. For example, a medium sized tree was given a high rating simply because it had only been compared to a number of short trees. After actually thinking about it, Rip discovered that he would not be able to obtain an accurate ranking of each tree in the wood until he compared every single tree to every other tree. Faced with this sobering proposition, Rip decided there had to be a better way to get the results to converge faster. In a moment of clarity, he determined that if he also took into account the percentage of times the 'compared-to' trees were found taller, that this would provide more accurate results. This did help somewhat but his method still required many more comparisons than Jeff's method did and the results were still sometimes misleading
Unfortunately for Rip, time ran out before he could complete measuring each combination of tree's in the wood. Comparing his results with Jeff's, they found that the results were very similar. However, the accuracy of Jeff's results were already perfect once he had done a single comparison for each tree in the wood (and the results were reconfirmed with each additional comparison). Rip's results were not quite perfect, despite having to compare many more trees than Jeff did.
Shifting Sands
Of course in the above scenario, Rip's method is similar to the RPI rating method, while Jeff's method is probably the absolute basic starting point for other models explicitly designed to measure team strength (ie. Jeff Sagarin, Massey Ratings etc.)
Some may ask; why then aren't these other models perfect indicators of team strength from the get-go ? The answer to that is 1.) there is no direct way to measure something ambiguous such as team strength (as compared to being able to measure the height of a tree) 2.) the actual team strength changes as the season progresses and 3.) there are other factors which come into play in college basketball which often allows weaker teams to win games against stronger competition on a given night. Below are more details about each.
1.) In order to assess something like team strength, there are literally hundreds of criteria which could potentially be used. How many points a team scores, the strength of each opponent and how well they did against them, whether they win or not, how many points they win by, how strong their defense is, whether they play more than their share of games at home, on the road or on neutral sites etc. are obvious ones (and most of which incidentally aren't even considered by the RPI). Some criteria are stronger than others and thus deserve more weight, while others are less so and often end up muddling the result as it may provide contradictory information. Sorting through what criteria is important, what types of variances are inherent with each measure and how they should be factored in are the domain of the various models and are what differentiates them from each other. Most of the details are proprietary. So although the different systems will produce different results (based on how the model is set-up), most every one should be more accurate than the RPI, which is hopelessly simplistic.
2.) Another key point is that team strength changes from day to day and week to week during the season. Players gain more confidence, they learn the system and become better at executing it. Other teams may have internal strife or lose confidence in themselves or their coach. Winning streaks and losing streaks can do wonders to the psyche of a team. Other important events may be the reinstatement of a key player or an injury.
All of these events have a definite impact on the actual team strength although accounting for these types of events are often the weak point of many mathematical ratings systems. Many simply aren't designed to recognize, for example, when a key player is injured and to be able to reassess team strength based on that. Even the ones that do may not be able to accurately determine how well the player who replaces the injured one will do or how well his teammate will take up the slack. Most mathematical models (the RPI included) are completely reliant on the event eventually impacting the results (ie winning % in the case of the RPI, margin of victory against a particular opponent in the case of Sagarin). The problem with this is that often these types of events may happen during the season and not enough games are played after the event to appreciably affect the ratings, especially ratings which don't differentiate between recent games versus games played early in the season.
This type of situation reveals another important point, and that is that it is foolish to rely completely on computer rankings. Weekly polls, as maligned as they are, are actually much more adaptive to compensating for these types of events and thus are (and should be) important factors in assessing how strong a team is compared to others (and therefore directly relevant as one tool for seeding of all teams by the NCAA Committee). Another strength of polls is that they can often easily dismiss teams which may look great on the computer screen, but have obvious deficiencies when seen in person.
3.) Even given the above factors which can confuse mathematical models, there is a final reason which can completely play havoc with them. That is that there are many unknown and unmeasurable factors which can cause a game's outcome to be opposite to what might be predicted on paper. These factors can include anything from what a player had for dinner, a bad night's sleep, overconfidence, lack of focus due to outside influences, poor officiating, fouling out, a hot 3-point shooter, a lucky bounce etc. The list is endless and college basketball is not immune (particularly with respect to other sports like football) where there are far few players and thus a factor affecting one of them can have a much greater impact on the game's outcome. No computer model can (or should) be expected to account for these types of factors.
The above demonstrates that there are countless reason's why even a perfect model cannot be expected to correctly pick the best team on any given night. The important point, in terms of this paper, is that the factors which adversely affect the Sagarin's etc. also adversely affect the RPI. So even if these other model's aren't perfect, they're still ahead of the RPI.
Why Should the NCAA Scrap (or Deemphasize) the RPI
There are five major reasons why the RPI needs to be reevaluated by the NCAA, and either completely overhauled or scrapped altogether. These are listed below.
The first reason deals with the basic flaws in the model itself [described in detail above] which affect its ability to reliably compute what it was intended to compute (ie a team's schedule strength and how the teams fares against that schedule). There are certainly other models out there [and most certainly created by someone with a better understanding of mathematics and statistics than whoever dreamed up the RPI] which could do this more reliably.
I've Got a Secret
An interesting, and in a way disappointing, recent development has been the committee's use of a 'secret' adjustment. Most likely in response to the fact that many of the RPI results defy logic to those expecting the RPI to be a measure of team strength. Although the details are not known exactly, this adjustment is most certainly an attempt to correct for the wide levels in talent found between conferences within Division I alone. Some may claim this makes the RPI more accurate, and it probably does. The fact that this type of tacked-on correction is necessary at all suggests to me that the basic model is fundamentally flawed.
This 'secret' adjustment is noted by some RPI apologists as a counter to the critics. They note that although the freely available RPI lists (such as Jerry Palm's list which is not modified from the basic model) don't seem to make any sense with reality, that those aren't the 'real' RPI numbers, which the NCAA computes. However since this 'real' list isn't made available and the formula is 'secret' there's no way on can argue against it. It's a perfect shell game.
Except for one thing. The actual RPI numbers as computed by the NCAA are released to the schools after the tournament. I personally haven't seen such a list and compared it to the unaltered lists such as computed by Palm, however Palm has and he has told me that he hasn't seen significant differences between the two. So whatever 'secret' adjustments which are being done, aren't significantly altering what is in reality a very poor and fundamentally flawed model.
The second major strike against the RPI is that these weaknesses within the model itself allow for potential manipulation by the major schools to attain a relatively high RPI without really playing as difficult a schedule as the RPI would imply. The flaw of the lack of specificity between schools with good records in major conferences vs. schools with good records from mid-major to poor conference can be exploited, as well as the lack of details in the RPI model which don't account for factors like home games versus away games etc. can be exploited.
The Golden Rules
Below is an actual example of how a team can form a schedule, based on a few principals designed to exploit the RPI's weaknesses, so that they can boost their RPI rating. This is excerpted from a post by Phil Beineke on rec.sport.basketball.college, December 12, 1999 titled "How to Beat the RPI".
HOW TO BEAT THE RPI WITHOUT REALLY PERFORMING
---------------------------------------------
1) Play teams who have really good records, even if they're going to thrash you.
2) Duck teams who have really bad records, especially the weaker teams from the big
conferences.
3) Stay at home.
In the 98-99 regular season, Purdue was 19-12, losing five of its last six games and finishing
7th in the Big 10. That doesn't sound good enough to make the tournament field of 64, let
alone the Top 25, but if you look at the RPI, Purdue rang in at #23. They did it by using a
schedule that (mostly) followed the three rules above.
One convenient aspect of the RPI is the way you can break it down game by game. If you
score a .600 RPI one game and a .400 RPI the next, your RPI for the two games will be
(.600 + .400) / 2 = .500. By this principle, a team gets an RPI score in every single game,
and its overall RPI is the average of each of its single-game RPI's. (*)
Below these comments is Purdue's game-by-game RPI from last season.
Comments:
Point (1) is exemplified by Gonzaga, Lafayette, St. John's, Butler, Valparaiso & Michigan
State. Purdue's two double-digit losses to Michigan State actually *helped* them stay over
the bubble.
Point (2) Notice that Purdue did better by losing to North Carolina than it did by splitting with
Penn State or beating Illinois-Chicago. Purdue was wise not to schedule many games against
the likes of Illinois-Chicago or UNC-Asheville.
Point (3) Look at all the home wins right at the top of the list; then look at the bottom of the list
-- filled with @'s. The RPI doesn't weigh in home court advantage, so Purdue was penalized
for every road game. Just look at South Carolina -- Purdue went and beat an SEC team on
the road, and that actually *hurt* their chances!
PURDUE's 98-99 SEASON IN ORDER OF DECREASING RPI
Result & GAME Opp. Opp
Opponent RPI Win% SOS
----------------------------------------
W Gonzaga .785 .828 .485
W St. John's (neutral) .784 .781 .577
W Lafayette .749 .769 .456
W Iowa .744 .692 .593
W Butler .743 .741 .491
W @Xavier .726 .700 .505
W Valparaiso .715 .708 .443
W Illinois State .659 .552 .531
W & W Illinois (h&a) .633 .467 .597
L & W Indiana (h&a) .618 .700 .573
W LaSalle .614 .481 .494
W Eastern Illinois .587 .444 .458
W & L Minnesota (h&a) .587 .640 .570
L & L Mich. St. (h&a) .580 .867 .585
--- tourney cut-off ~ .565 ---
W @South Carolina .537 .286 .575
W UNC-Asheville .518 .304 .462
L UNC (neutral) .508 .719 .594
W & L Penn St. (h&a) .505 .480 .559
W Ill.-Chicago .500 .259 .500
L & L Ohio St. (h&a) .493 .714 .545
L @Wisconsin .493 .700 .572
W & L Michigan (h&n) .464 .379 .598
L @Providence .402 .536 .537
L @Northwestern .389 .500 .555
(*) This is not precisely true; slight adjustments must be made when you face an opponent more than once.
(h&a) home & away
(h&n) home & neutral
This post brought on a few responses, some of which are included below to clarify some of the issues.
>This is an interesting demonstration, but what's your point? RPI, as you >indicate, is mostly a matter of schedule strength. But it is just one of >several factors that the selection committee weighs to determine the field of >64. Phil B. - ... but Purdue didn't face a particularly strong schedule of opponents. Instead, it raised its RPI by feasting on home games against Valparaiso, Lafayette, and Butler, who all had excellent winning percentages (in weak conferences) but who weren't a very stiff test for the Boilers.
>Would you rather see teams with cupcake schedules receive special >consideration? Phil - Well, I think the penalty for one or two cupcake games is way too big. Take last year's NIT champ Cal. They were excluded from the NCAA's because their RPI ranking was #58. Why so low? In part because they scheduled Eastern Kentucky, who went 2-23. Cal won the game 84-49, but if they hadn't played it at all, their ranking would've been #50, eight places higher!
The third reason why the NCAA should rethink the RPI is that it is unfairly skewed toward the top conferences. Not that the top conferences aren't deserving of the majority of at-large bids, only that even after a stellar regular season, unless a mid-major team wins their conference tournament, they are virtually locked out of gaining an NCAA bid. Dan Wetzel, in an article ("RPI continues to lack an ounce of reason", CBS Sportsline, Jan. 9, 2002) argues that the RPI formula artificially props up teams in major conferences while mid-majors steadily see their RPI rating erode due to the overall lack of winning % of the conference they happen to compete in. He also argues that because the top conferences are in the majority in the NCAA councils which determine these things, there is no incentive to do away with the RPI because it goes against their best financial interests. He writes:
"But the RPI formula has never provided an accurate picture of a team's demonstrated power. Its formula is strongly tied into opponents, therefore punishing teams who face off against terribly weak teams (regardless of margin of victory). Yet it overwhelmingly favors the nation's wealthiest conference, where even the worst team is capable of paying bad teams to play them, generating an at least decent win-loss record.
In conferences that aren't as wealthy, however, the worst teams can be an overwhelming drag on even an excellent club.
Thus it's a formula that effectively protects the revenue shares for major conferences by virtually guaranteeing the vast majority of at-large bids and favorable seeding."
Wetzel goes on...
"Statistically, playing a bad team that finishes 12-16 is four times more beneficial than a bad team that winds up 3-25. A game against a 27-4 team is nine times more beneficial.
Which means once a team gets into major conference play, the RPI of the lower conference teams inch down, regardless of result. Consider this season's Rutgers team, which after losing at home to Pittsburgh on Tuesday rose three spots in the rankings.
For mid-major teams, such upward movement is virtually impossible.
'Once we get into our league, I'm not sure how we can improve,' said Southern Illinois coach Bruce Weber, whose Missouri Valley team is 13-2 with an RPI of 48. 'We probably have to go 15-3 in the Valley just to stay where we are. You go on the road and win a game by 15; that's a pretty good win, but you might drop. It is scary.'
Another point made in the article is that even for mid-major teams with quality teams and quality out-of-conference wins (ie. Gonzaga, Butler, Kent State), the drag on their schedule rating by their conference not only makes it difficult to get into the tournament, even if they do make the dance, their seeding will be adversely affected. That realistically limits the revenue they will generate since they must upset tougher teams than they potentially should have to face in the first round in order to advance.
Beyond, that the converse to point #2 above is also true. That is while it is possible for teams from the top conferences to manipulate their RPI rating by clever scheduling ploys, that luxury isn't always available to their mid-major counterparts. In fact, lack of scheduling options can often hurt a mid-major's rating. Another quote from Wetzel's article:
Hence the team was forced to play three terrible squads -- Mt. St. Marys (319), Lipscomb (303) and Birmingham-Southern (292). The fact two of those games took place on the road and the Bulldogs won by an average of 25.3 points is of no concern to the computer.
In truth, Butler would have been better off not playing at all, scheduling Division II teams that don't factor in, or agreeing to play Kansas three times, using only walk-ons, and losing by 50 in each game. To the RPI, that would have demonstrated strength."
The fourth reason why the NCAA should rethink the whole RPI idea is that apparently the people who should know the most about it seem to know the least. The media is hopelessly confused, continually suggesting the RPI is a measure of team strength when it's not. Dick Jerardi of the Philadelphia News (after he saw the light) gives some insight into why the majority of the media might hold on to the RPI idea.
"The RPI. I was certain, explained everything. A college basketball team could be defined by a number, arrived in an objective way. I really believed that.
I was wrong. The RPI, it turns out, is little more than a brainwashing tool used on those of us who want easy answers. I wanted easy answers." - Philadelphia Daily News Online Edition, "Its High Time for the NCAA to KO the RPI," January 23, 2002.
Even the coaches and college administrators, the audience to which the RPI is supposed to spur to schedule more difficult schedules, seem to not have a clue.
Confusion Abounds
After the 1999 season, when the perennially powerful Atlantic Coast Conference was shocked to find that they received only three bids to the NCAA Tournament, the coaches met and were instructed by an "RPI Consultant" on what they needed to do in order to receive a higher RPI. (Tim Peeler,"Coaches Miffed by Virginia Snub," CNN/SI March 15, 2000).
The postscript to this story is amusing as the ACC didn't appreciably change their non-conference scheduling habits. In fact, for the 2000 season, their "power ranking" dropped to an all-time low and the conference still only got three bids to the tournament. [Not that scheduling is the only requirement for making the tournament. Obviously the teams need to actually produce on the floor.] An exasperated Coach Mike Krzyzewski was quoted as saying, "I wish our conference office would get a detailed report from the committee so our conference could know what it is that has given the impression that only three of our teams are worthy. Other conferences have had six, five, four teams. Maybe there are some things we don't know about." (from article by Tim Peeler,"Coaches Miffed by Virginia Snub," CNN/SI March 15, 2000)
The fifth reason why the NCAA should drop the RPI is it is becoming apparent, even to some of the more clueless journalists that the NCAA is in fact not being completely honest in the criteria they use.
The Shell Game
In the March 3rd, 2003 issue of Sports Illustrated ("A Madness to the Method ?"), Alexander Wolff wrote an article discussing the RPI. He did a good job of gaining access to the major players, Jim Sukup, Jerry Palm, Jeff Sagarin and Gary Johnson, the senior assistant director of statistics at the NCAA who was in charge of maintaining and tweaking the 'official' RPI. In the article, despite much confusion on the part of Wolff, some very telling nuggets of information still became known.
The most telling were the revelations that since 1984, the committee has requested Sagarin's ratings for their deliberations. Beyond that, according to Sagarin a committee member once confided with him that the committee does indeed use and appreciates his rankings. This certainly is understandable since the Sagarin ratings are well-respected and are specifically designed to reflect team strength, exactly what is the most useful measure for determining the best team among prospective at-large squads.
According to Sagarin's encounter, the committee member added that although the committee like his ratings, "we can't say that" The reason being that Sagarin's ratings take into margin of victory. Wolff explained earlier in his article that the NCAA is adverse to anything which seems to reward margin victory, as it encourages coaches to run up the score and it smacks of the type of point spreads which are often associated with gambling, an influence which has been and remains to be a threat to college basketball and sports in general.
From the above, it appears that the NCAA committee is indeed using more sophisticated and accurate measures of team strength and schedule strength than the RPI, they're just not telling anyone. For anyone who has compared the actual teams the NCAA has invited to the tournament along with where they've been seeded, this revelation shouldn't be surprising at all. In fact the RPI has consistently fared poorly in this measure, indicating that other factors besides the RPI are playing important roles in the committee's decisions.
As a postscript to the article in Sports Illustrated, Wolff wrote a follow-up article ("Replacing the RPI," SI.com March 1, 2003) where he expressed even more misgivings about the RPI than he did in the magazine and finally began to see the issue much more clearly. Wolff wrote:
"In fact, I came away from this story persuaded that fans do indeed put more stock in the RPI than the committee does, just as the NCAA has insisted for years. Further, I now believe that plenty of thoughtful committee members past and present fear that they've created a monster, and many would love to swap the RPI for something better. Recent members from big and not-so-big conferences alike, men like Big East commissioner Mike Tranghese and Missouri Valley commissioner Doug Elgin, have gone public about their RPI misgivings."
The fact remains that the RPI has taken a life of its own and its actual purpose and use by the committee has been lost in the hype. Unfortunately, the NCAA has never been an organization which has been good at admitting mistakes. Instead of admitting these inherent problems or scrapping the system outright (in favor of a more mathematically sound model of schedule strength), they seem to avoid discussing the RPI in much detail and explaining both to the media and to the coaches (who the RPI is supposed to influence) what exactly the RPI is meant to measure and how it's used.
Now That I'm Confused, what is the RPI exactly again ?
The following is all my opinion, take it for what it's worth. The RPI was created by the NCAA Tournament Committee to help differentiate teams when determining what bubble teams made the final cut and sometimes to choose between two otherwise equal teams for a particular seed. It is one of many criteria at the committee's disposal, no more, no less and not even one of the major criteria IMO. The RPI itself is more a measure of a team's schedule strength (ie it tries to rank teams by how difficult their schedule was) with a good portion added for the team doing well and for the team not scheduling obvious inferior teams. I believe the reason the NCAA began to publicize the information was an attempt to persuade teams to start scheduling tougher teams than they might have in the past. The RPI was simply the NCAA's way of telling bubble teams that they need to toughen up their schedule.
From the Horse's Mouth
Below is a transcript of the NCAA committee's use of the RPI by the chairman at the time. (NCAA News, February 8, 1995) This comes from Jerry Palm's College RPI site and I believe supports some of my thoughts above.
Since the RPI's invention, the committee has come to rely on it less than when it was first created. At the same time, the perception grows that the RPI is the only tool the committee uses to select at-large teams for the tournament. Committee chair and University of Kansas athletics director Robert E. Frederick probably will be heard saying this a time or two during selection Sunday: "The RPI is just one of many things we use. I know it's been said before, but it really is true." Frederick said the perceived importance of the RPI is understandable, since it is a quantifiable measure. But the men's basketball committee uses it more as a comparison tool when faced with filling the few remaining openings in the bracket -- "the nitty gritty," as it's referred to by the committee. "My observation has been that we've gotten more and more away from the reliance on the RPI. It has a great value to us in putting teams in categories 1 through 25 or 1 through 50, so that we can look at the teams and how they did against the top 25, the top 50 or top 100. It's really valuable in that regard, but as a single factor, it's not significant. "We've made a conscious effort to delay the introduction of the RPI into the process. We cast an initial ballot, in which each member of the committee selects 34 at-large teams, which is turned in on the night before we start the process. We still have not seen the RPI at that point. We don't get all that information until Friday (before selection Sunday)," Frederick said.
A more recent committee chairman, C.M. Newton is also quoted as saying that the RPI is 'one of many' criteria used by the committee. (JPS Note - I would appreciate if someone could supply me with the exact reference.)
And more recently, the 2001 NCAA Tournament Committee Chairman, Big East commissioner Mike Tranghese, was questioned about some of the things which are important to the committee. He mentioned as very important the Coaches Advisory Poll, which is a poll put together at the end of the season where coaches vote on and rank teams only in their own region. When asked about the RPI by the Sporting News, this is what he said.
TSN: What about the RPI? Is that as important as folks make it out to be ?MT: It's a tool and it clearly is used, but I think the RPI has taken on a life of its own. It's been made out to be something that I don't believe it is. I don't believe the RPI gets people in or gets people out. It's the factors that comprise the RPI that eventually get people into the tournament or get people out. The RPI is just something that people in the media can put their hand on. But at the end of the day, when you're trying to decide who that 34th team is, the differences between the 34th team and that 35th at-large team are all in the eyes of the beholder.
- from "March turns mad for Big East commissioner Tranghese", by Jeff D'Alessio, The Sporting News, March 1, 2001.
The Truth is Staring You in the Face
I briefly alluded to (earlier in the page) various web sites and services which have helped push and publicize the RPI and am critical of them for the way they tend to intermingle the RPI with what the committee does and where teams are seeded (something I find completely overblown). However, it is interesting to note that despite this, some of these sites have been very candid about what the RPI actually is. Jerry Palm of College RPI, for example, has clearly stated on his RPI FAQ that,
"The RPI (Ratings Percentage Index) is a measure of strength of schedule and how a team does against that schedule. It does not consider things like margin of victory or where a game is played, only whether or not a team won. It is used by the NCAA as one of their factors in deciding which teams to invite to the NCAA Tournament and where to seed them."
Beyond that, Mr. Palm has mentioned to me,
"I don't encourage people to use this [the RPI] for ranking teams. If anything, I discourage it. I bend over backwards to try to explain that this is NOT designed to be a definitive ranking of teams and the selection committee does not use it that way either."
With the number of college basketball fans who closely follow the RPI rankings and use the above (and other) sites, it would seem that some of them would actually take the time to read what the RPI actually is. For whatever reason, many basketball fans don't. (P.S., I'd like to take the opportunity to thank Mr. Palm for providing information about the RPI and historical rankings on his site.)
Where the Rubber Meets the Road
To see how all this all works, it is probably instructive to run through some real-life examples comparing RPI results with actual NCAA fields. Below are links to comparisons of the actual NCAA fields for the 2000-01 and 2001-02 seasons with various rating models and polls, including the RPI.
Take for example the 2000-01 results where the actual seeds as determined by the NCAA committee were correlated with various ratings (such as the RPI, Sagarin, Massey) and polls (such as AP etc.) to see which measure was best. In this example, the polls did a superior job at correlating with the NCAA than any mathematical model, including the RPI. In fact, the RPI was the worst predictor of where teams should be seeded of any of the other models.
This evidence flies directly counter to the prevailing 'wisdom' spouted by the media outlets suggesting the importance of the RPI on NCAA seeds, yet it shouldn't be surprising to those who have read this page. Afterall, of all the models, the RPI is the one which isn't really even a measure of team strength, but instead is more a measure of schedule strength. So the real question is why should the RPI correlate well with the NCAA seeds ? The answer is that although there is some commonality which will help the RPI correlate, it really doesn't have to.
(As an aside, the main conclusion from these correlations is that no one individual measure is the best at correlating with the NCAA seeding decisions. In fact, it is far better to take up all the quality ratings and polls and taking an average, since that average turns out to produce a far better correlation.)
In the example of the 2001-02 season, the RPI fares better (or perhaps more correctly, the competing models fare worse) than the previous year. When looking at the top-6 seeds (basically the top 25), the RPI was once again the worst predictor (at least among the mathematical models, the ESPN/USA Today was the worst overall). However, when looking at the top 12 seeds (basically the major conference champions along with at-large teams that year), the RPI improves so much that it jumps over the Massey rating in terms of correlating to the NCAA field of top 12 seeds. Looking back at the 2000-01 results, one can see that the RPI too made a healthy gain in its correlation when going from the top-6 seeds to the top-11 seeds (at least in comparison to the other models) although this still wasn't enough to pull it out of last place in 2001.
Why might this be happening ? The RPI's correlation is getting better in going from the top 6 seeds to the 'at-large' seeds much more efficiently than competing models (such as Sagarin, Massey) are converging, that it appears the RPI does exert some appreciable influence with the NCAA committee when it comes to this particular set of teams (ie bubble teams). Again, for someone who has read this page, this shouldn't be all that surprising since it is outlined that the NCAA committee does indeed use the RPI somewhat as a tool, especially when considering what bubble teams deserve an at-large bid.
So the take-home from these examples is that when someone talks about the RPI being an important determinant of seeding, they need to qualify themselves. First, it is far better to take into account the aggregate of a team (ie where they are ranked in various polls, what their Sagarin, Massey, RPI ranking etc. are) than to base their expectation on any single measure (including the RPI). Beyond that, the RPI really isn't very important for at least the top 6 seeds (basically the top 25 teams in the nation) and is in fact likely the poorest predictor of seedings for those teams. While it does have some measurable influence in the latter seeds, it's still not much better (if still not worse) than competing models. So in reality only fans of bubble teams should really concern themselves with their team's RPI ranking, and even then it's far from the crucial measure the media sometimes makes it out to be.
Discrepancy, or Just Common Sense ?
To see whether you were paying attention, consider the following scenarios.
Q - Consider a nationally ranked top-10 team playing a team from a low level conference (which happens to be a conference leader, and therefore has a very respectable record). The top-10 teams beats them by 25 and their RPI is improved (even though the RPI doesn't care how much they won by). However, if the same top-10 team played a team in the same low-level conference but with a poor record and beat them by 35, their RPI might plummet. This happened not because the margin they beat their opponents by was not enough, or really even whether they won or not (as losing only affects 25% of the RPI and even then doesn't differentiate a loss to the bottom dweller to a loss to another top-10 team). Their RPI dropped because they played a team with a terrible winning %. Now seriously, why should whether they beat a team ranked #175 (with a good record) vs. beating a team ranked #250 (with a bad record) significantly change the rating of how strong the team was ?
A - The answer is that it shouldn't and this is actually consistent with the theory of the RPI, contrary to what many people assume the RPI is meant to measure.
If one thought the RPI was a measure of how strong a team is, then the above scenario would be an outrage. Afterall, assuming the top-10 team won, the actual strength of the team had absolutely no effect on the rating. Whether they won by 1 or 100, it wouldn't matter.
If one thought the RPI was a mainly a measure of a team's schedule strength, then while the above scenario might still be considered unfair (the unfairness can be traced to the inadequacies of the RPI model itself BTW), it is more understandable in that it at least discourages teams from playing really bad teams. So in effect the RPI rating dropped, not because the team strength had magically changed, but because they were stupid enough to schedule that opponent and were being punished for it.
The second example of an apparent discrepancy between perception and reality of the RPI is illustrated below.
Q - The 1999-2000 Kentucky Wildcat team played a very difficult schedule. Despite a disappointing regular season record of 22-8, they were ranked an amazing #2 in the RPI for a good part of the season. This, despite most every poll and computer ranking having them generally ranked from the lower to upper teens. How could this be possible ?
A local sportswriter was surprised by this also, and wrote an article about it.
And, in the ongoing riddle of the season, how in the name of Wayne Turner can Kentucky still be No. 2 in the Ratings Percentage Index computer rankings ?
"I was wondering when somebody was going to ask that (last) question," said Jeff Sagarin, who provides another set of computer rankings to the NCAA Tournament selection committee.
The RPI is looking like the Really Preposterous Index. The formula, created by the NCAA and copied by Jim Sukup of College Basketball News in Carmel, Ind., continues to confound mathematical logic, hoops evidence and two-bit common sense by placing the Wildcats (20-8) behind only Cincinnati.
IF THE RPI is accurate, that means the Wildcats would be one of the four No. 1 seeds when the NCAA brackets are announced March 12. - by Rick Bozich, "With UK No. 2, RPI Does Not Compute," Louisville Courier Journal, February 29, 2000.
A - This is really not surprising at all, given the toughness of UK's schedule and the fact that only 25% of the rating comes from the team's record, which while not great was not bad either. In fact, by the end of the season, Kentucky had amassed such a large amount of RPI clout from its schedule and continued to play quality conference opponents, that whether they won or lost really had very little effect on the overall rating, something which seemed to surprise some Kentucky fans. The suggestion is put forth by Bozich that the RPI is used to place number one seeds, something which is contradicted by the above statement by Mr. Frederick. As a professional journalist, Bozich should know better, but then he's not alone.
If anyone has any comments or suggestions, please mail me. I'd particularly be interested in any direct quotes from NCAA Committee members regarding how the process works and how much weight they actually give to the different criteria. I'd also be interested in any statistical analysis comparing RPI results with other computer rankings, and how this correlates with things like polls, tournament success etc. I did an analysis of the 2001 NCAA Tournament which you might be interested in.
Return to Kentucky Wildcat Basketball Page.
Please send comments to Jon Scott