Quoting Mark Boon <[EMAIL PROTECTED]>:

Thanks for the comments Magnus.

On 20-nov-08, at 13:00, Magnus Persson wrote:

The way I understood the article, after a playout it updates all the
nodes at the current level of all the moves played during the playout
(if it's a win for the player) with a RAVE value that is used in a
similar fashion to the UCT value of a node. Only of the current node
does it update the win-visit ratio. Is that correct? This implies
creating a lot more nodes than I'm currently doing. I have seen remarks
that others postpone expanding a node until a certain number of
simulations have been done. I never quite understood the need for that,
but maybe this has to do with the fact that AMAF requires a much larger
number of node-creations?


I think you understand the basic principle of RAVE correctly. The hard part is how to weight together the AMAF value (which I call *virtual win-visit* ratio) with the actual win-visit ratio. And I have to admit that my implementation of RAVE in Valkyria might be a quite unique interpretation of the somewhat hard to understand information that has been posted here. I just implemented something that seemed to work and tested the parameters thoroughly. But I think I need to do new tests since there has been so many changes. Especially I should do test on large boards.

Valkyria postpones expansion of new nodes about 10 times, and an expansion includes all children of a position. This is necessary in order to update virtual win-visit scores for all moves played of the color for the current simulation even if there is only one. Thus a node in the tree is a list of children for a given position. This simplifies code that bias the tree search, since all necessary algorithms are executed simultaneously for all children during expansion. This is perhaps not efficient, but is perhaps less prone to bugs.

The effect of RAVE with a complete expansion is that after a few simulations (let us say 40) we have a rather good sample of virtual win-visit scores for all moves. Also if patterns biases exploration all real win-visit scores has been collected for those moves (often a single move) that are most likely to be the best. Thus the win rate for this position is already close to that of the best move. This is different from pure MC-UCT where all candidate moves are searched once and the win-rate initially is the average of all moves, bad as well as good ones. It takes a lot of time until MC-UCT will converge on to the winrate of the best move in each position.

Simply put combining virtual and real win-visit values for searching the tree lets us safely ignore the worst moves in all positions and the search basically only searches moves that our pattern heuristics predicted to be good or moves that were discovered thanks to AMAF.

The search is very selective when it works. And of course it often misses the best move. But it kind of works because a) often the second best move wins as well b) the selective search searches so deep effectively so it corrects itself quickly.

Magnus

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to