The essential guide to predictive college football rankings

The essential guide to predictive college football rankings

rank_guide_mainfigHow do you determine the best team in college football? Which four teams should make the playoffs?

College football rankings can help you answer these questions, but only if you find the right ones. With a good ranking, a higher ranked teams should more often than not beat a lower ranked team.

This article looks at the rankings you should take seriously in making predictions on college football games, whether you’re in a weekly pool, bet on games or just need to feel smart in front of your friends. Analytics also shows which rankings you can safely ignore.

The results on the predictive power of rankings are often surprising and counter intuitive. If you believe most of the conventional wisdom you hear on ESPN, you might want to stop reading right now.

The results below ask you to open your mind to new possibilities. For example, the preseason AP poll is not only useful during the season but makes good predictions on bowl games. This might seem crazy, but I’ll back it up with data below.

For the curious fans with the open mind, let’s get started.

List of trusted college football rankings

For quick reference:

College Football Playoff committee rankings

This committee of 13 people with backgrounds in college athletics has clear importance. Their rankings not only determine the four teams for the College Football Playoff but also influence the match ups for the New Years Six bowl games.

With only a few years of data, it’s not possible to say anything of significance about how often a higher ranked team wins a playoff or bowl game. However, there is data to suggest these ranking have predictive power.

The NCAA men’s basketball tournament has used a selection committee similar to the College Football Playoff committee to select the field and assign a seed to each team. From 2002 through 2017, the team with the higher seed has won 72% of tournament games (716 wins, 279 losses, with no prediction 50 games in which both teams had the same seed).

Let’s stop to appreciate this predictive accuracy. The selection committee consists of athletic directors and conference commissioners. There’s no requirement for coaching experience or a background in analytics. Yet over a huge sample of games, the higher seed wins more than 7 of every 10 games.

Will the College Football Playoff committee do this well with their rankings? Some factors point in their favor. This committee meets every week starting in late October until the season ends in early December. They need to rank 25 teams, not the sixty some teams of the NCAA tournament.

However, there are other factors working against the playoff committee. College football provides only 12 or 13 games each season to evaluate a team. With this small sample size, teams can look much better by their record than they deserve.

As an example, consider Florida State in 2014. The Seminoles won the BCS title the previous year and returned Heisman winning QB Jameis Winston.

However, the defense declined in 2014, and Florida State no longer dominated opponents. They had close calls against Notre Dame, Miami and Georgia Tech. “All they do is win,” said their supporters.

Florida State went 13-0 and won their conference championship. The committee ranked them third behind two one loss teams (Alabama, Oregon). The Seminoles fell apart in the playoff semifinal against Oregon, losing 59-20.

Only time will tell whether the College Football Playoff committee can be as good as the selection committee for March Madness.

The Preseason AP and Coaches polls

The preseason polls might seem worthless for making predictions. The humans of AP and Coaches have no games upon which to base their ballots. It seems more reasonable to wait until later in the season to look at these polls.

However, this is a mistake. The preseason AP and Coaches poll have remarkable predictive power, even during Bowl season. Human polls from later in the season do not.

To show this, we ask how often the higher ranked team in the poll beat a lower ranked team in a bowl game. In this study, I rank teams beyond the top 25 based on points earned from pollsters, and ranked teams are predicted to beat unranked teams.

Over the past 10 years, a sample of 339 bowl games, the preseason Coaches poll predicted 59.9% of bowl game winners (163-109 with no prediction in 67 games with two unranked teams). The AP poll didn’t do much worse at 58.8% of winners (154-108 with no prediction in 77 games).

To put this in perspective, the team favored by the closing line in the gambling markets won 61.5% of games according to The Prediction Tracker (208-130 with no prediction in one game). The visual shows these results.

preseason_poll_accuracy_2014

Note the prediction accuracy of the polls before the bowls is less than the accuracy of preseason polls.

The remarkable predictive power of preseason human polls most likely comes from the wisdom of crowds. No one sports writer or coach can create a perfect ranking. However, combining the ballots of many humans cancels out the small errors made by each one.

Don’t forget about preseason expectations. To see the preseason AP and Coaches poll for 2015, click here.

Points based computer rankings

There are so many college football computer rankings. Ken Massey compiles over a hundred of them on his site. How do you distinguish the rankings that make good predictions from those that do not?

There’s a simple trick for sorting the good from the bad rankings, and it comes from consider two ideas: strength of schedule and margin of victory.

The college football playoff committee has made strength of schedule a buzzword. How does one evaluate a team in the context of which teams they have played? Computer rankings are a numerical approach to answering this question.

Margin of victory doesn’t get discussed as much as strength of schedule. This lack of attention may have resulted from the old Bowl Championship Series. To deter teams from running up the score in the name of sportsmanship, they didn’t allow their computer polls to consider margin of victory.

Which idea matters more: strength of schedule or margin of victory?

To test this with data, we can construct rankings that consider neither, one or two of these factors. Consider the following metrics for rankings teams.

  • Win percentage. Fraction of games won. Considers neither strength of schedule nor margin of victory.
  • Colley Matrix. A computer poll that takes wins and losses and adjusts for strength of schedule.
  • Raw margin of victory. Points scored minus points allowed divided by number of games, a raw number that makes no adjustment for schedule.
  • Simple Rating System. A least squares ranking system that takes margin of victory and adjusts for strength of schedule.
  • The Power Rank. An algorithm I developed that takes margin of victory and adjusts for strength of schedule.

The visual shows how often each of these rankings predicted the winner in 339 bowl games from 2005 through 2014.

rank_accuracy_2014

Win percentage is hardly better than flipping a coin for each bowl game. The Colley Matrix does better than win percentage but not nearly as good as raw margin of victory. Strength of schedule without margin of victory results in poor rankings for making predictions, and you should avoid these rankings.

The two algorithms that take margin of victory and adjust for strength of schedule perform the best and almost as well as the closing spread from the markets (61.5%).

Let’s look at two recommended points based computer rankings that make good predictions.

  • The Power Rank. A method I developed based on research in statistical physics. For more details, click here.
  • Sagarin Ratings. Combines three different types of points based computer rankings for his college football rankings.

Play by play based computer rankings

Modern college football rankings go beyond the final score and use the play by play data from each game. I recommend the following rankings.

ESPN’s Football Power Index

ESPN’s analytics group has developed college football rankings based the idea of expected points added (EPA), or the notion that each play of a game has a point value.

To understand EPA, suppose a team has a 1st and 10 at their own 20 yard line. They could drive the length of the field for a touchdown for +7 points or kick a field goal for +3 points. In the worst case, an interception gets returned for a touchdown, netting -7 points for the offense.

Given a down, distance and field position, the offense’s expected points is an average of the net points of the next score. For example, Brian Burke of ESPN has used NFL play by play data to determine that 1st and 10 from a team’s 20 yard line gives +0.3 expected points.

Expected points added (EPA) is the points gained or lost from a play. For example, suppose the offense gains 20 yards from that 1st and 10 from their own 20 yard line. Burke calculates 1.3 expected points for a 1st and 10 from their own 40. Since the offense started in a situation with +0.3 expected points, they had +1.0 EPA for this play.

ESPN uses EPA in college football for their FPI rankings, numbers meant to make predictions looking forward. They use the Simple Ratings System, a least squares method for ranking teams, to adjust EPA for strength of schedule.

To check out ESPN’s FPI, click here.

Bill Connelly’s S&P+

Bill Connelly, SB Nation’s college football analytics guru, writes a preview for each and every FBS team, even New Mexico State. These treasures have become the only team previews I read each season.

Connelly’s numbers inform his writing as he ranks college football teams based on four factors.

  • Explosiveness – Measured by equivalent points per play, a metric similar to the expected points added used by ESPN’s FPI.
  • Efficiency – Measured by success rate, or 50% of the necessary yards on 1st down, 70% on 2nd down, and 100% on 3rd and 4th down.
  • Finishing drives – Measured by points per trip inside the opponent’s 40 yard line.
  • Field Position – Measured by average starting field position, a number affected by special teams.

His methods takes each of these factors and adjusts for strength of schedule. These four factors are combined to make the final rankings. Connelly provides a sense for the importance of each factor in his original article on football’s five factors.

Before the 2015 season, Connelly’s rankings depended on only success rate and points per play, which gives the term S&P. He has kept the same name despite adding two addition factors to the calculation.

To check out S&P+, click here.

Massey-Peabody

Cade Massey, a professor at the Wharton School of Business and Rufus Peabody, a professional sports gambler, have developed football rankings based on a simple idea.

We use only four statistics – one each for rushing, passing, scoring and play success. Rather than creating esoteric new stats (not that we aren’t occasionally impressed with those), we focus on “cleaning up” these relatively basic stats and then finding the appropriate weight for them in our model.

Most likely, they use yards per play for the rushing and passing numbers. The scoring component is similar to the points based rankings mentioned earlier. Last, the play success is either like Bill Connelly’s success rate or expected points used by ESPN.

Combining these metrics lead to powerful rankings. I use a similar ensemble method in the college football rankings and predictions for members of The Power Rank, and I most often check my results with those of Massey-Peabody.

To see the top 35 college football teams in Massey-Peabody, click here. They also publish NFL rankings.

Fremeau Efficiency Index

Brian Fremeau uses points per possession to evaluate teams in football. It starts by comparing the points earned on a drive with the expected number of points based on starting field position.

Accounting for starting field position is important. For example, if the offense gets the ball only a yard from the end zone, they should not get full credit for scoring the touchdown. Instead, the offense get 7 minus the expected 6.4 points teams usually score from the opponent’s one yard line.

Fremeau publishes his drive based numbers both on his own site and Football Outsiders. The latter site also combines FEI with S&P+ to obtain the F/+ rankings, an aggregate picture of team, offense and defense in college football.

Feedback

Have a question or know of other rankings that should be included? Send me an email here.