Input of an alternate scoring system needed...
As the Dpoints are not an ideal way to represent a players game-strenght I'm thinking about implementing an alternate rating system (in addition to the traditional Dpoints)
Any math experts here?
Page 13 of 25
Retillion (2304 D (B))
06 Feb 13 UTC
I also think that 1 vs 1 games should be ranked separately.

Indeed, a 1 vs 1 game has no diplomacy at all. It requires absolutely no communication and is only a tactical game. Isn't Diplomacy a communication game ?
hiporox (988 D)
06 Feb 13 UTC
@Retillion: The same is true of Gunboat games
Retillion (2304 D (B))
06 Feb 13 UTC
@ hiporox : I almost agree with you and for that reason I am not at all interested by playing Gunboat games but many players estimate that their orders are a form of communication.
yebellz (0 D)
06 Feb 13 UTC
Omitting or separately ranking 1v1 games, gunboats, and strange/unbalanced variants may be a very good idea, since those would all be very different games.

With 3 player games being at the lower end of the spectrum, you will still see a slight bias toward larger games, if using a normalizing factor of n. A 3 player game would get 2/3, while an n-player game gets (n-1)/n. So toward the extreme, very large games will be worth almost 1.5 times as much as 3 player games.
G-Man (2516 D)
06 Feb 13 UTC
I concur again.
Oli (977 D Mod (P))
06 Feb 13 UTC
I think we shouldn't scale the large games down that much.
Basically you played much more people in an 35-player game than on a 3player-game (17 times more) and also the final outcome of an elo-based rating (the approach of a generic score) is always the same, a 4 month 34-player game that adjust your rating just some margins might lead to a psychological letdown and a displeasure of such large games (even if mathematically correct). So I would like to allow more/bigger changes, even if that means generating a bigger uncertainty just because of this psychological effect.
Anon (?? D)
06 Feb 13 UTC
Join this game plz gameID=12420
yebellz (0 D)
08 Feb 13 UTC
RE: "So I would like to allow more/bigger changes, even if that means generating a bigger uncertainty just because of this psychological effect."

Ok, so how about the following proposal to give a slight boost to large games, while not making them to :

Out of the 83 variants listed on this site, the vast majority are for 2-10 players. There are 11 variants that are suited to 11 or mores players. Specifically, they serve game sizes of 11, 12, 12, 13, 13, 15, 17, 19, 34, 34, and 35. I would consider these to be the "big" to "jumbo" games.
Collectively, ~210 games across these 11 variants have been played, representing a relatively small fraction of the ~6700 games finished on this server.

Since Oli wishes to see a ranking system that gives a slight boost to these large games, how about introducing an additional weighting factor for games with larger than 10 players, according to the following rule:

weighting factor for games with more than 10 players = min( (number of players)/10 , 3)
Note: this weighting factor is capped at x3 to avoid super-over-weighting any other large variants that come along. The cap value of 3 is somewhat arbitrary and can be tweaked a bit, but I would definitely avoid letting this factor get much larger since I think that might result in too much volatility.

This weighting factor in addition to the (n)-normalization on pairwise adjustments would result in the following outcomes for winning a game of various sizes against equally ranked opponents:
Winning a 5-player game: +7.5*(4/5) = 6 pts
Winning a 7-player game: +7.5*(6/7) = ~6.43 pts
Winning a 15-player game: +7.5*(14/15)*(1.5) = 10.5 pts
Winning a 35-player game: +7.5*(34/35)*(3) = ~21.86 pts

The 7.5 factor comes from the arbitrary base value of 15 used within the K-factor calculation. The value of 15 could be further tweaked to trade-off speed of convergence versus volatility.
Oli (977 D Mod (P))
08 Feb 13 UTC
Thats sounds much better.
I will add some more code for the stats over the weekend, but use an algorithm we mentioned a few days ago. (Because each time I start coding the discussion progress and I need to start over before I reach a prototype.)
Oli (977 D Mod (P))
08 Feb 13 UTC
Another thought.
Even if WTA should score all to the winner I think we should adjust the ratings for the survivors/defeats too. A 20 player game should not adjust the score of only 1 player. A player that survives constantly in a WTA is much more likely to win sometimes, even if his rating is just as bad as a turn-4-defeated. And his rating should express this.
Also generating scores for the survivours/defeated will generate much more scoring-data and make a defeat much more problematic (one loss to the winner, and one 0.3 to each of the survivours for example).
Oli (977 D Mod (P))
10 Feb 13 UTC
Ok. Some results are now up for discussion. You can check the HoF again from the menue.

How does it score:
It breaks every game in smaller 1on1.
Win->Draw->Survive->Defeated/Resigned on a pure 1:0 (or 0.5:0.5 for same result)
Defeated:Defeated -> Not counted.
In WTA: Survive:Survive = 0.5:0.5, Survive:Defeated = 0.75:0.25
In PPCS: Survive:Survive = SCs:SCs
Each game gets a gValue (100% - 1 for each player)
If a player took over a country the game does not score that much.
If a game did not reach the targetSC for the variant (because of custom settings) it does reduce the value too.

Does not work at the moment: CDs score 0:1 against everybody.

You can click on each player-name in the HoF to get a list of his games.
You can click on a gameID to get a breakdown of the data.
kaner406 (2088 D Mod (B))
11 Feb 13 UTC
Do the stats skew with variants like Rinascimento? I ask because I got a hit of -43 (Ferrara) and -32 (Milano), it would seem to me that loosing this variant as Ferrara should be worth less of a hit than Milano which starts with more units.
G-Man (2516 D)
11 Feb 13 UTC
Awesome. Just awesome. The list of each player's games and the game data are really nice features. Great job.

Personally, I do think the larger variants are in fact weighing a bit too heavy. For example, I've only played WW IV once for a loss (a team game which was drawn by 27 out of 35 players just after I was eliminated), and the one loss accounts for -86 of my score, 37 D +/- higher than any other score I've generated, and my largest score by far.

Additionally, one game of WW4 and and one game of Chaos account for -119 out of a total -136 negative points points for me, which is 3 games out of the 15 games I've played of 7 or more players here (vs. 302 D for the other 12 of 15 games). So, with a loss being far more likely in WW4 or Chaos than anything under 18 players, it behooves me to just avoid those variants if I care about my rating, and that seems a little severe.

Otherwise, looking over who was in my games and the results, the player vs. player, smaller variant, and Win/Draw/Survive/Defeated scoring feels pretty good. All in all, I really like where you're going and this is going to be a fantastic upgrade for everyone. Thanks Oli!
Devonian (1887 D)
11 Feb 13 UTC
Oli, I don't understand how the scoring works from your explanation.

I have games with negative points when I had a draw and zero points for wins.

Also, what does this mean:
Each game gets a gValue (100% - 1 for each player)
Spartan22 (1883 D (B))
11 Feb 13 UTC
Just curious, where is this available to be viewed? I may have missed where it was said in the thread...
kaner406 (2088 D Mod (B))
11 Feb 13 UTC
^the HoF menu-link in the top right-hand corner
Devonian (1887 D)
11 Feb 13 UTC
I also have a win, where I earned negative points.
Spartan22 (1883 D (B))
11 Feb 13 UTC
@kaner thank you very much!
Captainmeme (1400 D Mod (B))
11 Feb 13 UTC
I'm also fairly surprised by some changes... For my last two Havens (both of which were very respectable draws) I have 0 and +4 D respectively, whereas I have +38 D for a 9 SC survive the time before...
Captainmeme (1400 D Mod (B))
11 Feb 13 UTC
Ouch... I just looked through again and the one vs ones have a huge effect, considering how many you can finish in the same time as a Haven. I don't think the large maps are overweighted considering how much time you have to put into them compared to the others (although I'm still confused as to how I got 0 D for gameID=9759...)
Oli (977 D Mod (P))
11 Feb 13 UTC
The 1on1s you mentioned just score so much, because the skill-difference is soo much different. In contrast because you where so favored in the last Haven games you didn't gain much points.
This is just how an elo-based system works. Once you have "your" rating it does not move that much until you perform better than before.

Also there might still be bugs in the code. Esp. the gV-value does not seem right in some games and there seem to be rounding errors.
Retillion (2304 D (B))
11 Feb 13 UTC
Oli, thank you very much for your work !
I have just finished a game today and it has not been taken into account yet.
I am just telling you in case you don't know that it doesn't happen automatically.
kaner406 (2088 D Mod (B))
11 Feb 13 UTC
Ah - I forgot that it was based on the Elo system. It's all looking pretty good Oli -> I know I might be asking for the stars here - but will we be able to get variant specific rankings on top of this as well? (who is the best Known World 901 player? etc... :)
Leif_Syverson (1725 D Mod)
11 Feb 13 UTC
Looks great! I like being able to trace through and see where in the sequence of games I did well and where I lost points. Invaluable feature!!

Is the wiki updated on the formulas for us to double check the math if we think there might be a bug? (I don't see anything at the moment, but I'd just be interested in running the math through myself to understand it.)

I also agree with GMan, losing large games should be the expected result for most players, and so the penalty for losing those games shouldn't be all that steep, while the rewards for winning should be large.

I'm also thinking that by inflating/deflating certain results (depending on how they are implemented), the final calculation has a chance of losing it's zero-sum properties (a necessity for an Elo style rating to work correctly). Being able to check the math on the wiki would help me (and others) to verify this.

I'd like to mention additional features for this particular rating (and no hurry or rush but just to get the ideas down for future reference), I think that having an 'all time best ranking' (for instance mine was 1279 and I don't know where that would have put me at the time, but currently would have put me 3rd) and a monthly rating would be a good idea. And here's part of the reason I mention these additional features (though maybe fine tuning some of the parameters would work better to address what I'm seeing):

It seems there is going to be a fair bit of swing in rankings because I'm currently co-ranked with 3 other players at 1144, and with my average point swing per game being 15 D (and noting that there are 34 players within that 15 point average swing) that means my next game on average would put me either up to 60th from 76th or down to 95th. Considering we are only ranking 150 players right now, and the range of outcomes for a single game is *on average* (not counting max or min) 35 positions different, there will be a lot of turn over daily as games finish and so it seems things will be changing rather rapidly. With the range between 1st and 150th being just over 200 D (way too small for an ELO system imo) and I've seen players with swings as high as 89, you can climb the ladder (or fall off it) rather suddenly. Anybody have a good idea for how these game to game changes would compare to the Ghostrating over at webdip?
kaner406 (2088 D Mod (B))
11 Feb 13 UTC
@LS - I don't know how feasible it would be - but a HoF-t (Hall of Fame - total) as a separate pag that shows everyone's ranking, like Ghostrating., would be pretty awesome.
yebellz (0 D)
11 Feb 13 UTC
@Oli, what are the exact formulas that you implemented?

I disagree with the contention that a player who consistently survives WTA games is performing. All that demonstrates is that they are consistently allowing another player to solo while surviving long enough to be blamed for failing to prevent that.

Any rating system would inherently define the objectives of the game, since players will aim to optimize their ratings performance. Hence, if you give a benefit to surviving versus being defeated, that would create an incentive to play for survival while giving up easy solos in WTA games, especially for lower ranked players that stand to benefit most from such a strategy.
BeauLemioux (1905 D)
11 Feb 13 UTC
Damn! I lost about 40 D for a game that was drawn instead of cancelled! :P
Leif_Syverson (1725 D Mod)
11 Feb 13 UTC
I tend to agree with you yebellz on WTA games.. That is the main difference between WTA and PPSC.
Devonian (1887 D)
11 Feb 13 UTC
I think there is definitely a bug in the system. I should not lose points for winning. gameID=12224
Oli (977 D Mod (P))
11 Feb 13 UTC
It's already fixed:

I'm not 100% convinced, but I changed the WTA-formula to allow only to score a win to the winner and do not count all the others. It's more in line with the implementation of the DPoint distribution and I mentioned many times that I want to work all scoring systems in a similar way.

I excluded the Rinimascento-variant from the scores, because this variant is highly unbalanced on purpose and really liked here. There can't be a meaningful way of adding the stats of this variant to the generic rating.

A rating by variant, presstype and potType will follow as soon as the algorithm is set.

The rating does not update after a game is finished at the moment. I don't want to mess with the "official" webdip-code till the development of the formulas are done and the database-table-layout is finalized.

I still need to score the CDs.

Page 13 of 25

