Re: [computer-go] Re: language efficiency

Don Dailey Tue, 18 Dec 2007 18:47:45 -0800

Álvaro,

I'm going to take another look at alpha-beta with play-outs.   I have a
lot of new ideas I want to explore.


- Don


Álvaro Begué wrote:
>
>
> On Dec 18, 2007 4:21 PM, Don Dailey <[EMAIL PROTECTED]
> <mailto:[EMAIL PROTECTED]>> wrote:
>
>
>
>     Chris Fant wrote:
>     >>> I suspect that for very long time controls we would be better off
>     >>> turning UCT (with, say 10K playouts)  into an evaluation
>     function and
>     >>> then using alpha-beta on top of it.
>     >>>
>     >>> Álvaro.
>     >>>
>     >> This is very interesting to me.    Not the memory management
>     part, but
>     >> the fact that you believe the tree is not being grown optimally
>     (if that
>     >> is what you are saying.)
>     >>
>     >
>     >
>     > I thought his point was that with an alpha-beta layer on top of the
>     > UCT layer, you can do much longer searches because you are throwing
>     > away the large UCT tree after each evaluation of an AB tree node.
>     >
>     I'm not sure he proposes this as a solution to the memory problem or
>     whether he believes it creates a tree with a better shape. 
>
>
> Actually, it's the latter. The big problem with UCT in my opinion is
> that it uses the exploration rule for two things:
>  1) visiting more probable moves more frequently, and
>  2) mixing the results of playouts to form a score that can be
> propagated up the tree.
>
> The UCB rule satisfies both purposes well enough when you are
> searching a few tens of thousands of simulations, but as you go to
> longer time controls (or, equivalently, faster hardware) you'll find
> that you still want to spend some time analyzing that queen sacrifice
> (sorry for the chess analogy), but if it results in disaster, that
> shouldn't pollute the score that I propagate up the tree! In UCT both
> notions are tied together, but as you gain more confidence in your
> search, you should converge to the score-backup rule of just returning
> the score of the best move. Once you get to that point, you can start
> using alpha-beta for pruning.
>
> Now it's time to get to work on dimwit to prove that there is some
> truth behind these reasonably sounding ideas. As John once said, our
> jobs are getting in the way of our go programming. ;)
>
>
> Álvaro.
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Re: language efficiency

Reply via email to