Andy wrote:
Remi, you mentioned how the other algorithms predicted well and
guessed that it's because the great majority of games are between
experienced players whose strength is not changing much. I also feel
that the existing KGS ratings work well for those players already. So
how about focusing on how the various algorithms perform in the case
of improving players. I think it would be interesting to simulate
game results of various improving players and show how the different
rating algorithms work.
For example: Suppose a player's true strength is 1500 for some time,
and then he suddenly improves to 2000. Both before and after he plays
a fixed number of games per day (say 10). Show a graph of what each
rating algorithm would think his rating is over time. Many people
complain that the KGS algorithm does not move fast enough for a case
like this.
I believe that the main weakness of KGS (and all decayed-history
algorithms) is that rating uncertainty grows like the exponential of
time. It should grow like the square root of time, which has a
completely different shape. So, in the case of players who play
frequently (10 games per day is a lot !) the ratings get completely
stuck. On the other hand a player who stops playing for a while and
comes back to the server will experience huge rating jumps. The WHR
algorithm can handle this correctly, but the KGS algorithm cannot,
whatever parameter is used.
Your suggestion to illustrate the difference on artificial scenarios is
good. In fact, you are not the first one to make it. I will probably use
artificial scenarios in my presentation at the conference.
Also the last paragraph of section 4 talks about how the model does
not account for the different ability of new players to change
(improve) their ratings compared to older players. Could you vary the
parameter 'w' based on the player's current rating? (Assume players
with low ratings are capable of improving more quickly than strong
players). I don't know enough about the math to know if this would
blow up the computation time or if that's simply impossible.
Yes, it is my next direction for improving the system. In this paper, I
focused on trying to compare different approaches: incremental, decayed
history, whr, etc. In order to be fair, I used the same simple but wrong
model for every algorithm. Now, I am convinced that WHR is significantly
better than alternative approaches, and the next step is to improve the
model.
Your idea is good, and it would not blow up the computation time. I have
already started to work in this direction. I split the game database by
sorting the games based on average player strength into different
levels, and tune optimal parameters for each level. Not surprisingly, I
found that the optimal w² is higher for weaker players than for the
stronger players. I still have to find a nice way to handle the fact
that with a variable w², ratings don't have a relative value anymore,
but an absolute value. It is then important to avoid drift, and some
other subtle problems.
Thanks for your comments,
Rémi
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/