Thank you very much, Silver. Interesting report! -Hideki
David Silver: <[EMAIL PROTECTED]>: >Hi all, > >On 7-Feb-08, at 1:30 AM, [EMAIL PROTECTED] wrote: > >> Note as well that the current implementation of MoGo (not the one at >> the time of the ICML paper) use a different tradeoff between UCT and >> Rave value, thanks to an idea of David Silver, which brought >> improvements in 19x19 (where the Rave values are the most useful), >> while it was marginal (still better) in 9x9. But anyway we here are >> talking about 9x9, so it can't explain what you are talking about. >> > >I think it is time to share this idea with the world :-) >The idea is to estimate bias and variance to calculate the best >combination of UCT and RAVE values. >I have attached a pdf explaining the new formula. >---- inline file > > >>> (2) (....) Depending on the playout >>> policy, adding an upper confidence bound to the rave values can push >>> some terrible bad moves up (like playing on 1-1). The reason seems to >>> be that such moves are normally sampled very infrequently (so the UCB >>> will be higher), and when they are selected (...) >> >> That could be an explanation, but there are two points: >> - the prior you put on top of Rave often avoid to first sample 1-1, >> and even when you do, you very often loose just 1 playout because of >> the UCT value you get right away. >> - I never observed a big discrepancy between the number of Rave >> samples for each move. > >Also, the upper confidence bound reduces rapidly with RAVE, because so >many moves are played in each playout. So even without prior >knowledge, moves like the 1-1 point should be observed less when using >RAVE, because they will quickly become associated with losing games. >RAVE acts like a pruning mechanism - these bad moves don't even need >to be played in the tree, to identify that they are a bad idea. It is >also like progressive widening, because all moves are tried in the >tree eventually, once the UCT estimate starts to dominate the RAVE >estimate. So it is perhaps not a surprise that programs with pruning >and progressive widening see less improvement when implementing RAVE - >the ideas overlap a great deal. > >Of course, the all-moves-as-first heuristic is often wrong - so RAVE >can make big mistakes. But on average it improves performance, which >is what matters. > >-Dave >---- inline file >_______________________________________________ >computer-go mailing list >computer-go@computer-go.org >http://www.computer-go.org/mailman/listinfo/computer-go/ -- [EMAIL PROTECTED] (Kato) _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/