Peter Drake wrote:
1) If the computation necessary to find better moves is too
expensive, performing many "dumb" playouts may be a better investment.
2) If the playouts are too deterministic, and the moves are merely
pretty good, the program may avoid an important move and thus
misjudge the value of a position.
Chris Fant wrote:
IMO, this is the most interesting part of Computer Go today. How can
one possibly design an optimal playout agent when making a playout
agent that plays strong is not the solution? The only known method
seems to be trial and error.
That is the key question of UCT. I totally agree with Peter's conditions
and add another two: 3) (should be 1) _Unbiased_ ! The smallest bias
ruins everything. and 4) Low variance.
Low variance is the clue for improvement.
A random move is almost as bad as a pass move. E.g. you can win against
a random player by passing your first 180 moves. (I did it with Idiotbot
which is not exactly a random player.) As an approximation, if you
consider a random move as bad as a pass move, the blunder per move ratio
would be equal to the temperature of the game. You are evaluating the value
of the game real by summing:
v_eval = v_real + t1 - t2 + t3 - t4 + ...
The condition of no bias is:
E[v_eval] = E[v_real] <=> E[t1 - t2 + ...] = 0
If the playout was perfect, you would evaluate
v_eval = v_real + 0 - 0 + 0 - 0 + ...
and you would only need one playout.
The variance of the estimator strongly depends on the variance of the
Bernoulli process (= the "blunder per move ratio" if we put it that way)
in a way that produces v_eval -> 1/2 when |ti| grows or n grows.
It is not true that improving the playout is unimportant. Syvain does not
claim that neither. I have read him stating the it is important. But you
have to follow the rules:
minimize "blunder per move ratio"
subject to: The game is unbiased, fast and random enough
Some fast ideas could be favoring the moves near the precomputed
border of the territory to be defended (ownership maps) or similar
ideas that may be fast, unbiased and reduce "blunder per move ratio"
Jacques.
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/