Quoting Mark Boon <[EMAIL PROTECTED]>:
Thanks for the comments Magnus.
On 20-nov-08, at 13:00, Magnus Persson wrote:
The way I understood the article, after a playout it updates all the
nodes at the current level of all the moves played during the playout
(if it's a win for the player) with a RAVE value that is used in a
similar fashion to the UCT value of a node. Only of the current node
does it update the win-visit ratio. Is that correct? This implies
creating a lot more nodes than I'm currently doing. I have seen remarks
that others postpone expanding a node until a certain number of
simulations have been done. I never quite understood the need for that,
but maybe this has to do with the fact that AMAF requires a much larger
number of node-creations?
I think you understand the basic principle of RAVE correctly. The hard
part is how to weight together the AMAF value (which I call *virtual
win-visit* ratio) with the actual win-visit ratio. And I have to admit
that my implementation of RAVE in Valkyria might be a quite unique
interpretation of the somewhat hard to understand information that has
been posted here. I just implemented something that seemed to work and
tested the parameters thoroughly. But I think I need to do new tests
since there has been so many changes. Especially I should do test on
large boards.
Valkyria postpones expansion of new nodes about 10 times, and an
expansion includes all children of a position. This is necessary in
order to update virtual win-visit scores for all moves played of the
color for the current simulation even if there is only one. Thus a
node in the tree is a list of children for a given position. This
simplifies code that bias the tree search, since all necessary
algorithms are executed simultaneously for all children during
expansion. This is perhaps not efficient, but is perhaps less prone to
bugs.
The effect of RAVE with a complete expansion is that after a few
simulations (let us say 40) we have a rather good sample of virtual
win-visit scores for all moves. Also if patterns biases exploration
all real win-visit scores has been collected for those moves (often a
single move) that are most likely to be the best. Thus the win rate
for this position is already close to that of the best move. This is
different from pure MC-UCT where all candidate moves are searched once
and the win-rate initially is the average of all moves, bad as well as
good ones. It takes a lot of time until MC-UCT will converge on to the
winrate of the best move in each position.
Simply put combining virtual and real win-visit values for searching
the tree lets us safely ignore the worst moves in all positions and
the search basically only searches moves that our pattern heuristics
predicted to be good or moves that were discovered thanks to AMAF.
The search is very selective when it works. And of course it often
misses the best move. But it kind of works because a) often the second
best move wins as well b) the selective search searches so deep
effectively so it corrects itself quickly.
Magnus
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/