Derby and Logic, by Stat Man: How it works [with as little maths as possible]

Alright, you likely saw the Women's UK & Ireland Derby Chart on the other page. I hope it makes sense. If you think your team's rank is in error, or I have used the team's name rather than the league's, send me an email at Roush.adam.h@gmail.com and I'll sort it for you. Just check on this page first to make sure any bouts which may have been missed are considerable. Thanks!

Also, I will publish a second explanation with full maths soon. Stay tuned if you love linear algebra!

Eligibility for teams

To be included in the calculations, a team must be an interleague (travel) team from a regular bouting league located in the UK or Ireland. When the Channel Islands and the Isle of Mann get derby going, I'll count them, too. The geographical limitation is based solely on the limited processing ability of the computer running these calculations, and the fact that it is not a simple calculation to run, as shall be made clear.

As the teams are travel teams representing their league, they should all be listed in the chart by league then level (if below A) for ease of understanding.

Eligibility for bouts

To be included, a bout must:

Have its score publicly listed, whether ticketing is open or closed. A public ranking scheme must be based solely on public data; a behind-closed-doors event cannot, in good conscience, factor into these calculations.
Take place within the last 12 months. Every sport's ranking scheme has expiry dates. Most sports do at the end of a season. Even sports which run year-round (rugby, cricket) have an expiry date for their rankings. Otherwise, Australia would still be cruising on Bradman's bowling.
Be between two eligible teams. Playing a European team, a US team, a mixed team, or even Team Sealand won't count, as those teams are ineligible for this UK & Ireland ranking scheme. As the calculations require both teams to be in the system, these bouts must be excluded.

Rankability for teams
All eligible teams with eligible bouts are included in the calculation. However, to be ranked in the chart, a team must:

Complete more than one eligible bout. A team with only one eligible bout does not have enough data to be properly ranked. Two should provide enough, although a drastically different result in the third could result in a strong change to the rankings.

Complete at least one bout against a rankable team. Most unrankable teams have a strength value of 0. Thus, unrankable teams playing other unrankable teams all have scores of 0. Once one plays a rankable team, however, then this problem is solved.

Consider three hypothetical teams: Basingstoke, Guildford, and Slough. If each one plays each other one as their only bouts, it would be impossible to mathematically connect them to any other team. Subjectively, anyone would know that they're likely ranked below LRG, but without data, an objective scheme cannot make the same conclusion.

However, once Basingstoke take on, say, Croydon, now the whole lot of them can be connected. Because one becomes rankable, and the other two played that rankable team, all three are now rankable teams.

The Maths, itself

Ok, let's roll our collective sleeves up, and do the maths. I've been asked, "does this system base on points differential?" The answer is "both yes and no." Each team has the results of a bout considered three separate formulae:

Pts%: Points for/total points. A team winning 150-100 would have .600 for this value.
Win%: 1 for a win, 0 for a loss. The team above would have 1.000 for this value.
Adj%: A complex formula, which considers the difference between a 9 and 10 point lead as greater than a 99 and 100 point lead. Thus a healthy lead is rewarded, and running up the score matters less and less. The team above would have .723 for this value.

Iteration

So, this can be considered for each team against each other team played. Consider the score for team X, and their bout against team Y. To better determine how difficult that bout is, the value for team X in that bout is multiplied by the average value of team Y in all their bouts.

Why stop there? It would be still more data to know how Y did in their bouts, and how the teams they played in their bouts did against who they played, and how those teams did against the teams they played, and how those teams did against ... ..... .. .... ... .. .... ... .. ..... .... ... .....

Iterating this an infinite number of times is mathematically possible, and actually the easiest bit to do. At the end, it gives me an S_s value for each of the three formulae for each team: S_pts, S_win, and S_adj.

Strength Determination
Once the three S_s values are determined for each team, it's possible to obtain an overall value, which I call S_tot, for S-total. This is calculated as:

25% S_pts

50% S_win

25% S_adj

Where does this formula come from? Well, it's empirical. I ran the results for several different weeks, and adjusted the formula until it produced the lowest number of upsets. At best, it hits 17%, at worst 28% with this formula. Any other combination does worse.

Determining Rank Pts
After all that, you'd think I'm done. Nope, S_tot is not the rank points, it's a measure of how difficult a team is to play. But, it's only one step away from the final. To determine the rank points, the final formula is run:

The Adj% value for each bout is multiplied by the S_tot value for the opponent, and the average of all of these products taken. This average comes out as a number between 0 and 1, thus unfriendly to read. To correct this, the number is multiplied by a thousand, and displayed with only one decimal place.

Voilà!
Now we have rank points, and we can order the teams by rank points, and we're done. Publish that, and the chart's done.

Remember that rank points aren't linear! A team with 50.0 rank points compared to a team with 25.0 rank points are just as separated in ranking as a team with 5.0 rank points compared to a team with 2.5 rank points. In both cases, the higher ranked team has about the same chance of winning should the two go head-to-head.

Does that all make sense? I hope it does! If not, feel free to hit me up on facebook or Roush.adam.h@gmail.com, or in person when I'm not announcing at a bout, and I'll do my best to answer your questions. As well, if you feel that eligible bouts were not counted, or were mis-counted, get in touch.

Roll on!

Derby and Logic, by Stat Man

Thursday, September 20, 2012

How it works [with as little maths as possible]

No comments:

Post a Comment