I really don't like the idea of ranking moves and scoring based on the
distance to the top of a list for a pro move.  This is worthless if we
ever want to surpass humans (although this isn't a concern now, it is
in principle) and we have no reason to believe a move isn't strong
just because a pro didn't pick it.  Perhaps another pro would pick a
different move in the same situation.  If we had a pro that ranked all
the legal moves, or at least the top 10 or so, this would be one
thing, but those data are almost never available.  Or if we had two
pro's watching a game and giving what they thought was the best move
at each point and we scored an evaluator based on how it did only when
the pro's agreed (although this would bias scoring towards things like
forced moves).  Also, there are bad moves and then even worse moves
(like filling the eyes of a powerful living group, killing it).  If an
evaluator makes catastrophic evaluations sometimes and plays a perfect
pro move other times, it could still be much worse in balance (if we
can't tell when it is blundering versus being brilliant).

I think it would be much more informative to compare evaluator A and
evaluator B in the following way.
Make a bot that searched to a fixed depth d before then calling a
static evaluator (maybe this depth is 1 or 2 or something small).  Try
and determine the strength of a bot using A and a bot using B as
accurately as possible against a variety of opponents.  The better
evaluator is defined to be the one that results in the stronger bot.

Obviously this methods introduces a whole host of new problems (even
finding the "strength" of a running bot is non-trivial), but at least
it attempts to measure what we would eventually care about --- playing
strength.  So of course we care about how fast the static evaluators
are, because we might be able to search more nodes with a faster
evaluator, but for measuring the quality of the evaluations, I can't
at the moment think of a better way of doing this.

One of the problems with my suggestion is that maybe the evaluators
are better at evaluating positions beyond a certain number of moves
and that if we could just get to that depth before calling them, they
would be much more accurate.  Or maybe changes to how searching works
can compensate for weaknesses in evaluators or emphasize strengths.
Really one would want the strongest possible bot built around a single
evaluator versus the strongest possible bot built around another
evaluator, but this is clearly impossible to achieve.

I guess the another question is, what would you need to see a static
evaluator do to be so convinced it was useful that you then built a
bot around it?  Would it need to win games all by itself with one ply
lookahead?

- George

On Tue, Feb 17, 2009 at 2:41 PM, Dave Dyer <dd...@real-me.net> wrote:
>
> This is old and incomplete, but still is a starting point you might
> find useful  http://www.andromeda.com/people/ddyer/go/global-eval.html
>
> General observations (from a weak player's point of view):
>
> Go is played on a knife edge between life and death.  The only evaluator
> that matters is "is this stone alive", and there are no known proxies
> that will not fall short a significant amount of the time.  If you fall
> short once or twice in a game against a competent player, you will lose.
>
> General strategic considerations will play you false every time.
>
> -- Notwithstanding the above, improving general considerations
> will improve play, but not much.  It's all about the minutia of
> the situation.
>
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to