Re: [computer-go] Where and How to Test the Strong Programs?

Don Dailey Thu, 13 Dec 2007 13:33:45 -0800

Regarding correspondance with human ranks, and handicap value, I cannot
tell yet. It is very clear to me that the Elo-rating model is very wrong
for the game of Go, because strength is not one-dimensional, especially
when mixing bots and humans. The best way to evaluate a bot in terms of
human rating is to make it play against humans, on KGS for instance.
Unfortunately, there is no 9x9 rating there. I will compute 9x9 ratings
with the KGS data I have.


I'm only interested in measuring the ELO gaps between 9x9 players of
different (19x19) rankings.   This you can do by simply taking the
statistics of wins and losses between players of various strengths. 

I don't really know what you mean by "one-dimensional."   My
understanding of playing strength is that it's not one-dimensional
meaning that it is foiled by in-transitivities between players with
different styles.    You may be able to beat me,  but I might be able to
beat someone that you cannot.      If that's what you are saying how
does the kyu/dan system  handle it that makes it superior to ELO for
predicting who will win?    Is there some mechanism that measures
playing styles?    I don't see this.  

What I THINK you mean is that the gap between various GO ranks is not
consistent in ELO terms.   In other words there is no single constant
that works in my simple formula.     I definitely think this is probably
the case but surely it can be easily handled by a slightly more
sophisticated formula that "fits the curve."

So surely, the average 2 dan player will beat the average 1 dan player
some statistically measurable percentage of the time at 9x9 go.   This
is what I want to know.    Then I want to know if that percentage is the
same at different points in the scale.   If not,  then we find an
appropriate fit statistically.   

Once this is done then we still have the problem of calibrating CGOS -
we have to determine which ELO rating on CGOS corresponds to 1 dan (or
some arbitrary AGA or KGS ranking.)    

Once all of this is done, we at least have something that doesn't yet
exist.   A credible way to claim your 9x9 program would likely hold it's
own against a 19x19 player of a given level.

Of course this will be somewhat noisy,  as any ranking system is.   It
will be subject to in-transitivities just like ELO and go ranking
are.      But you have to admit that there has been talk about certain
9x9 programs playing at the "dan level" and so on.     Technically this
makes no sense,  but intuitively we know exactly what we mean when we
say that - we mean that it is the "equal" of a 1 dan 19x19 player.   
This is what I want to capture as a footnote on the 9x9 server.

I agree with you about program playing different versions of
themselves.   I can throw out games where a program plays another
verison of itself if you want to study that too (I would go by
password.)    Just let me know and I will run another "hall of fame"
using that criteria or I can send you the data from cgos in a compact (1
line per game) representation if you think it would be useful to help
you understand this.   Or I can send you the pgn files I produced to be
compatible with bayeselo.


- Don


 
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Where and How to Test the Strong Programs?

Reply via email to