A new position is always visited unless the leaf of the tree is the
end of the game. In that case, one player always win, so the other
always win. Then, the losing player will explore all the other moves
to avoid the sure loss. If all moves are still loosing, that will
propagate to the move before, and the exploration will begin and so
on.
("min" --> "loss" I guess)

As far as I see,
if RAVE gives constant value 0 to one move, it will never be tested if other moves
have non-zero AMAF values.

A move
with "real" empirical probability 0 of winning and AMAF value of 0.01
will always be preferred to a non-simulated move with AMAF 0.0, whatever may be
the number of simulations.

So, I don't see why the bandit would be consistent, unless we have
assumptions on the MC or on RAVE values.

I might be completly wrong, as I said
I have only retro-engineered the bandit in mogo until the recent PDF file. I
trust your opinion more than mine :-)

There are people studying some specific positions with surprising behavior,
but I am not working on that with them, they might want to post their
analysis in this mailing-list...




_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to