Andy wrote:
Remi, you mentioned how the other algorithms predicted well and guessed that it's because the great majority of games are between experienced players whose strength is not changing much. I also feel that the existing KGS ratings work well for those players already. So how about focusing on how the various algorithms perform in the case of improving players. I think it would be interesting to simulate game results of various improving players and show how the different rating algorithms work.

For example: Suppose a player's true strength is 1500 for some time, and then he suddenly improves to 2000. Both before and after he plays a fixed number of games per day (say 10). Show a graph of what each rating algorithm would think his rating is over time. Many people complain that the KGS algorithm does not move fast enough for a case like this.

I believe that the main weakness of KGS (and all decayed-history algorithms) is that rating uncertainty grows like the exponential of time. It should grow like the square root of time, which has a completely different shape. So, in the case of players who play frequently (10 games per day is a lot !) the ratings get completely stuck. On the other hand a player who stops playing for a while and comes back to the server will experience huge rating jumps. The WHR algorithm can handle this correctly, but the KGS algorithm cannot, whatever parameter is used.

Your suggestion to illustrate the difference on artificial scenarios is good. In fact, you are not the first one to make it. I will probably use artificial scenarios in my presentation at the conference.


Also the last paragraph of section 4 talks about how the model does not account for the different ability of new players to change (improve) their ratings compared to older players. Could you vary the parameter 'w' based on the player's current rating? (Assume players with low ratings are capable of improving more quickly than strong players). I don't know enough about the math to know if this would blow up the computation time or if that's simply impossible.

Yes, it is my next direction for improving the system. In this paper, I focused on trying to compare different approaches: incremental, decayed history, whr, etc. In order to be fair, I used the same simple but wrong model for every algorithm. Now, I am convinced that WHR is significantly better than alternative approaches, and the next step is to improve the model.

Your idea is good, and it would not blow up the computation time. I have already started to work in this direction. I split the game database by sorting the games based on average player strength into different levels, and tune optimal parameters for each level. Not surprisingly, I found that the optimal w² is higher for weaker players than for the stronger players. I still have to find a nice way to handle the fact that with a variable w², ratings don't have a relative value anymore, but an absolute value. It is then important to avoid drift, and some other subtle problems.

Thanks for your comments,

Rémi
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to