I really don't like the idea of ranking moves and scoring based on the distance to the top of a list for a pro move. This is worthless if we ever want to surpass humans (although this isn't a concern now, it is in principle) and we have no reason to believe a move isn't strong just because a pro didn't pick it. Perhaps another pro would pick a different move in the same situation. If we had a pro that ranked all the legal moves, or at least the top 10 or so, this would be one thing, but those data are almost never available. Or if we had two pro's watching a game and giving what they thought was the best move at each point and we scored an evaluator based on how it did only when the pro's agreed (although this would bias scoring towards things like forced moves). Also, there are bad moves and then even worse moves (like filling the eyes of a powerful living group, killing it). If an evaluator makes catastrophic evaluations sometimes and plays a perfect pro move other times, it could still be much worse in balance (if we can't tell when it is blundering versus being brilliant).
I think it would be much more informative to compare evaluator A and evaluator B in the following way. Make a bot that searched to a fixed depth d before then calling a static evaluator (maybe this depth is 1 or 2 or something small). Try and determine the strength of a bot using A and a bot using B as accurately as possible against a variety of opponents. The better evaluator is defined to be the one that results in the stronger bot. Obviously this methods introduces a whole host of new problems (even finding the "strength" of a running bot is non-trivial), but at least it attempts to measure what we would eventually care about --- playing strength. So of course we care about how fast the static evaluators are, because we might be able to search more nodes with a faster evaluator, but for measuring the quality of the evaluations, I can't at the moment think of a better way of doing this. One of the problems with my suggestion is that maybe the evaluators are better at evaluating positions beyond a certain number of moves and that if we could just get to that depth before calling them, they would be much more accurate. Or maybe changes to how searching works can compensate for weaknesses in evaluators or emphasize strengths. Really one would want the strongest possible bot built around a single evaluator versus the strongest possible bot built around another evaluator, but this is clearly impossible to achieve. I guess the another question is, what would you need to see a static evaluator do to be so convinced it was useful that you then built a bot around it? Would it need to win games all by itself with one ply lookahead? - George On Tue, Feb 17, 2009 at 2:41 PM, Dave Dyer <dd...@real-me.net> wrote: > > This is old and incomplete, but still is a starting point you might > find useful http://www.andromeda.com/people/ddyer/go/global-eval.html > > General observations (from a weak player's point of view): > > Go is played on a knife edge between life and death. The only evaluator > that matters is "is this stone alive", and there are no known proxies > that will not fall short a significant amount of the time. If you fall > short once or twice in a game against a competent player, you will lose. > > General strategic considerations will play you false every time. > > -- Notwithstanding the above, improving general considerations > will improve play, but not much. It's all about the minutia of > the situation. > > _______________________________________________ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/