[computer-go] Re: computer-go Digest, Vol 43, Issue 8

Hideki Kato Sat, 09 Feb 2008 02:43:30 -0800

Thank you very much, Silver.  Interesting report!

-Hideki


David Silver: <[EMAIL PROTECTED]>:
>Hi all,
>
>On 7-Feb-08, at 1:30 AM, [EMAIL PROTECTED] wrote:
>
>> Note as well that the current implementation of MoGo (not the one at
>> the time of the ICML paper) use a different tradeoff between UCT and
>> Rave value, thanks to an idea of David Silver, which brought
>> improvements in 19x19 (where the Rave values are the most useful),
>> while it was marginal (still better) in 9x9. But anyway we here are
>> talking about 9x9, so it can't explain what you are talking about.
>>
>
>I think it is time to share this idea with the world :-)
>The idea is to estimate bias and variance to calculate the best  
>combination of UCT and RAVE values.
>I have attached a pdf explaining the new formula.
>---- inline file
>
>
>>> (2) (....) Depending on the playout
>>> policy, adding an upper confidence bound to the rave values can push
>>> some terrible bad moves up (like playing on 1-1). The reason seems to
>>> be that such moves are normally sampled very infrequently (so the UCB
>>> will be higher), and when they are selected (...)
>>
>> That could be an explanation, but there are two points:
>> - the prior you put on top of Rave often avoid to first sample 1-1,
>> and even when you do, you very often loose just 1 playout because of
>> the UCT value you get right away.
>> - I never observed a big discrepancy between the number of Rave
>> samples for each move.
>
>Also, the upper confidence bound reduces rapidly with RAVE, because so  
>many moves are played in each playout. So even without prior  
>knowledge, moves like the 1-1 point should be observed less when using  
>RAVE, because they will quickly become associated with losing games.  
>RAVE acts like a pruning mechanism - these bad moves don't even need  
>to be played in the tree, to identify that they are a bad idea. It is  
>also like progressive widening, because all moves are tried in the  
>tree eventually, once the UCT estimate starts to dominate the RAVE  
>estimate. So it is perhaps not a surprise that programs with pruning  
>and progressive widening see less improvement when implementing RAVE -  
>the ideas overlap a great deal.
>
>Of course, the all-moves-as-first heuristic is often wrong - so RAVE  
>can make big mistakes. But on average it improves performance, which  
>is what matters.
>
>-Dave
>---- inline file
>_______________________________________________
>computer-go mailing list
>computer-go@computer-go.org
>http://www.computer-go.org/mailman/listinfo/computer-go/
--
[EMAIL PROTECTED] (Kato)
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] Re: computer-go Digest, Vol 43, Issue 8

Reply via email to