how perfectly swell: matthew prins (or matt prins, or thew, or...oh, you don't care) alone with his stupidity

I'm only allowed one great idea a year, and this is it?

(Much logical thinking ahoy; please ignore if you're not that type.)

Like all sports fans who are also mathematically inclined, I've on occasion considered building my own computer rankings model -- for, say, NCAA basketball. (Like Ed did.) My main objection to the current college basketball computer models are that they do a very poor job of comparing a 24-5 small conference team with a 18-11 ACC team, since there's little overlap in quality of teams played; it's basically the programmers' whims that choose which team is better. (Don't even get me started with the RPI, which is as arbitrary as computer "models" get.) The better rankings try to approximate the comparison by using some sort of point differential, but that's problematic in any sport where the only goal is to outscore the either team, whether by 1 or 50. (Unfortunately, point differential almost has to be used in any decent college football model, due to the paucity of games played.) Unfortunately, I never came up with a good idea how to achieve the goal of evenhandedness until this morning, when I had a probabilistic revelation. Here's a step-by-step process to building a (nearly) perfect computer model (and I simplify a couple steps slightly):

a) Take the end-of-year rankings from a decent computer model, grab 5 or 6 years of data (both said rankings and teams' wins and losses), and build about 700 regressions — two for each ranking spot (home and away). The regression formulas would attempt to predict how likely it would be for each ranking spot to beat each other ranking spot. (e.g. For the no. 10 home regression, it might be 44% for no. 2, 58% for no. 15, 75% for no. 25, 99.6% for no. 100, etc. The away regression would be slightly lower.)

b) Now, to the current season. Come up with some reasonable base rankings (winning percentage, RPI, whatever). For each team, do the following:

b') Pretend that team is no. 1. Use the no. 1 regression models to come up with the chances of winning for each game that team has won and the chances of losing for each game the team has lost. Multiply those chances together to come up with the probablity that a team ranked no. 1 would win and lose those exact games.

b'') Repeat for nos. 2 through 350.

b''') Pick the ranking with the highest probability. That's the team's score.

c) Re-rank the teams using the scores in b''' and redo steps 2 and 3 about 100 times.

The beauty of this system is that it's the perfect equalizer between mid-major and major conference schools (for reasons I'll be happy to expand on if anyone is interested). The only downfall is that teams that are undefeated are automatically at the top and teams that have no wins are automatically at the bottom (due to there not being a per-se strength of schedule component), but by the end of the season that's likely a moot point. Unfortunately, I have neither the time nor the mad programming skillz to build these rankings — nor use this insight to win $1,000,000 — but hey, it's still a great idea.

oh so lovingly written byMatthew |
Fatal error: Uncaught Error: Undefined constant "p4885551796732928824" in D:\InetPub\vhosts\notnothing.net\httpdocs\prinsiana\journal\2007\03\im-only-allowed-one-great-idea-year-and.php:248 Stack trace: #0 {main} thrown in D:\InetPub\vhosts\notnothing.net\httpdocs\prinsiana\journal\2007\03\im-only-allowed-one-great-idea-year-and.php on line 248