Regarding correspondance with human ranks, and handicap value, I cannot tell yet. It is very clear to me that the Elo-rating model is very wrong for the game of Go, because strength is not one-dimensional, especially when mixing bots and humans. The best way to evaluate a bot in terms of human rating is to make it play against humans, on KGS for instance. Unfortunately, there is no 9x9 rating there. I will compute 9x9 ratings with the KGS data I have.
I'm only interested in measuring the ELO gaps between 9x9 players of different (19x19) rankings. This you can do by simply taking the statistics of wins and losses between players of various strengths. I don't really know what you mean by "one-dimensional." My understanding of playing strength is that it's not one-dimensional meaning that it is foiled by in-transitivities between players with different styles. You may be able to beat me, but I might be able to beat someone that you cannot. If that's what you are saying how does the kyu/dan system handle it that makes it superior to ELO for predicting who will win? Is there some mechanism that measures playing styles? I don't see this. What I THINK you mean is that the gap between various GO ranks is not consistent in ELO terms. In other words there is no single constant that works in my simple formula. I definitely think this is probably the case but surely it can be easily handled by a slightly more sophisticated formula that "fits the curve." So surely, the average 2 dan player will beat the average 1 dan player some statistically measurable percentage of the time at 9x9 go. This is what I want to know. Then I want to know if that percentage is the same at different points in the scale. If not, then we find an appropriate fit statistically. Once this is done then we still have the problem of calibrating CGOS - we have to determine which ELO rating on CGOS corresponds to 1 dan (or some arbitrary AGA or KGS ranking.) Once all of this is done, we at least have something that doesn't yet exist. A credible way to claim your 9x9 program would likely hold it's own against a 19x19 player of a given level. Of course this will be somewhat noisy, as any ranking system is. It will be subject to in-transitivities just like ELO and go ranking are. But you have to admit that there has been talk about certain 9x9 programs playing at the "dan level" and so on. Technically this makes no sense, but intuitively we know exactly what we mean when we say that - we mean that it is the "equal" of a 1 dan 19x19 player. This is what I want to capture as a footnote on the 9x9 server. I agree with you about program playing different versions of themselves. I can throw out games where a program plays another verison of itself if you want to study that too (I would go by password.) Just let me know and I will run another "hall of fame" using that criteria or I can send you the data from cgos in a compact (1 line per game) representation if you think it would be useful to help you understand this. Or I can send you the pgn files I produced to be compatible with bayeselo. - Don _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/