yebellz or someone else who's got a really good grasp of the math, can you back me up here and help me explain what's wrong with the current code in terms of a true ELO system? I'm not hugely skilled in expected value calculations and statistics.
Oli, by happenstance the numbers came out more or less believable (and as I show below can be used for a ranking system), however the math behind the calculation is, to the best of my knowledge, fundamentally wrong... that is IF the calculation is intended to be an ELO type system with expected values and actual values of player performance.
The math from the wiki makes sense for an ELO system (I've adjusted it with a few additions noted with an asterisk to refine certain calculations)
Expected result player1 (Re1) and player2:
Re1 = 1 / ( 1 + ( 10^( (Rating2 - Rating1) / 400) ) )
Re2 = 1 - Re1
Real Result Calculation:
*For PPSC survives only:
Rr1 = Pl1SCs/(Pl1SCs + Pl2SCs) (if both have 0 SCs set Rr1 = 0)
Rr2 = Pl2SCs/(Pl1SCs + Pl2SCs) (if both have 0 SCs set Rr2 = 0)
*Rr1 and Rr2 then become some number between 0 and 1
*For PPSC or WTA solo by player1:
*Rr1 = 1
*Rr2 = 0 (survive or defeat)
*For PPSC or WTA draw vs draw:
*Rr1 = 0.5
*Rr2 = 0.5
*For PPSC or WTA draw vs defeat
*Rr1 = 1 draw
*Rr2 = 0 defeat
*For WTA survive vs survive or survive vs defeat
*Rr1 = 0.5 survive
*Rr2 = 0.5 survive/defeat
*For PPSC defeat vs defeat
*Rr1 = 0.5
*Rr2 = 0.5
*Likewise CD's can be rated 0:1 against other performances
Relative Performance Calculation:
D1 = Rr1 - Re1
D2 = Rr2 - Re2
mV, gV then adjust the final score change based on player investment and map size to scale things appropriately.
scoreMatchPlayer1 = K * gV * mV * D1
scoreMatchPlayer2 = K * gV * mV * D2
When applied player vs every other player for all players this is a multiplayer extension of an Elo based system, and mV should not additionally be used to calculate any performance difference based on SC proportions between the players in this case.
D1 and D2 never end up zero (unless a player's actual performance against his opponent exactly matches his expected performance), and so the score adjustment occurs for each player against every other player every time as it should.
Now, however, since the code is different than the wiki description, I attempted to put together a description for what I think is happening in the code (not 100% sure I represented it accurately but I did my best):
---(my commentary on the validity of the calculation is displayed in this form)
Relative player skill for player1 (Re1) and player2 (Re2):
Re1 = 1 / ( 1 + ( 10^( (Rating2 - Rating1) / 400) ) )
Re2 = 1 - Re1
---(good, though this is no longer really an expected value, it results in a number between 0 and 1 and is a measure of relative player skill of sorts)
SC counts (SCc) and percentage of total SC's (SCq) are adjusted if needed as follows...
---(note that SCc as adjusted below no longer represents supply center counts, but rather, arbitrary 1 and 0 values that serve to function as a switch indicating a gradient of performances, as it were)
RESIGNS/DEFEATS
If any player resigned or was defeated set their SC totals and percentages to zero
SSc = 0
SCq = 0
---(This makes sense, they have no SC's)
CD's
If player1 CD'd then player2 gets an SC count of SCc2 = 1, unless he also CD'd in which case it is 0. The SC proportions are SCq2 = 1/# of players, unless player2 also CD'd in which case SCq2 = 0.
This case is repeated for if player 2 CD'd and player1 did or did not.
---(Basically SCc is now a flag indicating that one player who did not SC did better than another player who did, regardless of the performance of the player who did not CD.)
DRAWS
If player1 was in the draw, SCc1 = 1, SCq1 = 1/# of players in draw
If player2 was in the draw, SCc2 = 1, SCq2 = 1/# of players in draw
---(Again, SC count is no longer SC count but an arbitrary binary value of either 1 or flagging that a draw was rated a 1 over a defeat; if both players drew, each gets a 1 and their mV ends up 0 thus resulting in no score change between the two)
WTA SOLO
If player1 solo'd and game type is WTA: SCc1 = 1, SCq1 = 1 and SCc2 = 0, SCq2 = 0
likewise if player 2 solo'd
---(Here again SCc is changed to an arbitrary binary value indicating success or failure rather than SC count).
ALL OTHER CASES:
If player1 had more SC's than player2: SCc1 = 1, SCc2 = 0
Otherwise if player2 had more SC's than player1: SCc2 = 1, SCc1 = 0
Otherwise if player1's SC count is not 0 both player1 and player2 get their counts adjusted to 1: SCc1 = 1, SCc2 = 1
---(Again, basically, player1 either completely succeeds against player2 while player 2 completely loses or vice versa, even if player1 had only 1 more SC than player2. This section of the code obliterates any remaining information SCc contained about the actual SC counts and together with all the other adjustments to SCc serves to turn SCc into a binary flag rather than an SCc. At least this is consistent).
The performance flags are then calculated:
$Rr1 = ( ($SCc1 + $SCc2) > 0 ) ? ($SCc1 / ($SCc1 + $SCc2)) : 0;
$Rr2 = ( ($SCc1 + $SCc2) > 0 ) ? ($SCc2 / ($SCc1 + $SCc2)) : 0;
---(So basically from what I can tell, since SCc1 and SCc2 are always either 1 or 0,
Rr1 : Rr2 is always either 1:0 or 0:1, based on which player had the greater result of: W/D/S/E/B (Win/Draw/Size of Survive/Eliminated/Booted for CD)), for either PPSC or WTA game types.
Rr1 = 1
Rr2 = 0
For one level of finish vs another (other than equal survives) such as Draw vs Draw, Defeat vs Defeat or CD vs CD effectively:
Rr1 = 0
Rr2 = 0
Basically Rr ceases to be a measure of Real Result just as SCc has ceased to be a measure of SC count and Rr is a performance flag based on the W/D/S/E/B criteria)
mV is then calculated based on the SC proportions:
mV = abs(SCq1 - SCq2)
---(SCq is valid as a proportion of SC's for the case of defeats/resigns, and in the case of any player vs CD it is reasonable at 1/# of players thus effectively splitting all the score everyone gained against the CD'd player by number of players, though this should probably actually be (# of players - # of players that CD'd). It also looks to be valid for a WTA solo at 100% and 0% for all others, and it remains in all other cases the original SC proportion, including for PPSC solos, so in the end SCq is still essentially a valid quantity for SC proportion).
mV is then futher adjusted based on whether a player joined later (betting differently than the original players)
gV is calcualted based on victory conditions and map size
---(no detail provided here as these calculations are arbitrary scaling values that we've all more or less agreed on and work more or less as intended).
I now diverge from following the code to comment on this method as a ranking system:
So mV at this point contains all the information about relative performance between players (scaled by their share of risk in the original game), and real result is now a binary number or a flag of sorts that no longer means what it was supposed to).
Thus when you get to the equation in the code for
Ch1 = (Rr1 - Re1) * mV * gV
the quantity (Rr1-Re1) has no real meaning as Rr is now essentially a binary switch that either indicates a positive score increase or negative score increase (a score decrease rather) based on the criteria:
Win > Draw > Survive (Survives being ranked from largest to smallest) > Defeated > CD
Re1 then becomes an arbitrary value that increases or decreases the size of the score adjustment based on something approximating an expectation but since the loop isn't closed via Rr to change the score based on expected vs actual result, this isn't really an expectation any more, though it is a sort of measure of player skill going into a match.
So, it turns out Oli that your system as currently implemented behaves fairly well as a ranking system, but if the goal is a multiplayer ELO system, your system no longer is one.
Your code and calculations can be significantly simplified, if you like this current system which is basically a weighted score adjustment based on the criteria of W/D/S/E/B
Win > Draw > Size of Survive (Survives being ranked from largest to smallest) > Eliminated/Resigned > Booted for CD
with equal performances between players resulting in no score adjustment.
I am more than happy to keep your current system, but we shouldn't claim it is an ELO system any longer and I would like to request that we also get a true ELO system implemented alongside your system at some point for comparison.