[computer-go] How can one possibly design an optimal playout agent

Jacques Basaldúa Thu, 05 Jul 2007 02:19:09 -0700

Peter Drake wrote:

1) If the computation necessary to find better moves is tooexpensive, performing many "dumb" playouts may be a better investment.

2) If the playouts are too deterministic, and the moves are merelypretty good, the program may avoid an important move and thusmisjudge the value of a position.


Chris Fant wrote:

IMO, this is the most interesting part of Computer Go today.  How can
one possibly design an optimal playout agent when making a playout
agent that plays strong is not the solution?  The only known method
seems to be trial and error.


That is the key question of UCT. I totally agree with Peter's conditions

and add another two: 3) (should be 1) _Unbiased_ ! The smallest biasruins everything. and 4) Low variance.


Low variance is the clue for improvement.

A random move is almost as bad as a pass move. E.g. you can win against
a random player by passing your first 180 moves. (I did it with Idiotbot

which is not exactly a random player.) As an approximation, if youconsider a random move as bad as a pass move, the blunder per move ratio

would be equal to the temperature of the game. You are evaluating the value
of the game real by summing:

v_eval = v_real + t1 - t2 + t3 - t4 + ...

The condition of no bias is:

E[v_eval] = E[v_real]  <=>  E[t1 - t2 + ...] = 0

If the playout was perfect, you would evaluate

v_eval = v_real + 0 - 0 + 0 - 0 + ...

and you would only need one playout.

The variance of the estimator strongly depends on the variance of the
Bernoulli process (= the "blunder per move ratio" if we put it that way)
in a way that produces v_eval -> 1/2 when |ti| grows or n grows.

It is not true that improving the playout is unimportant. Syvain does not
claim that neither. I have read him stating the it is important. But you
have to follow the rules:

  minimize "blunder per move ratio"

  subject to: The game is unbiased, fast and random enough

Some fast ideas could be favoring the moves near the precomputed
border of the territory to be defended (ownership maps) or similar
ideas that may be fast, unbiased and reduce "blunder per move ratio"

Jacques.

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] How can one possibly design an optimal playout agent

Reply via email to