Re: [Computer-go] some UCT notes

Álvaro Begué Mon, 24 Oct 2011 19:03:42 -0700

On Mon, Oct 24, 2011 at 1:45 PM, Dave Dyer <[email protected]> wrote:
>
> I've been working with UCT search for other games than Go, and one 
> interesting thing I"ve learned is that the results can change dramatically 
> depending on how the UCT values are manipulated as the tree grows.
>
> Consider the root node; at the beginning of the search it's desirable to 
> sample all the children equally, to be sure each has a fair chance to be 
> noted as winning or losing.  However, as the simulations continue, if this 
> egalitarian distribution continues, the simulations from losing nodes dilutes 
> the results (as well as wasting time), so it's necessary to start 
> concentrating on the winning nodes.  The exact method of transitioning from 
> broad to narrow focus can have dramatic effect on the results.


Doesn't the UCB formula basically encode this behavior? What I think I
learned about UCT from experimenting with dimwit is that, for nodes
other than the root, you need to reduce exploration so scores are not
too polluted by bad moves, but then the principal variation gets
ridiculously deep, which means that more exploration is needed. At the
root you can make the search explore more, since you don't need to
back out a score.

I don't know if go has an equivalent to queen sacrifices in chess, but
it would be very hard to make a UCT program that plays something like
that correctly: The queen sacrifice would look like a horrible move,
with really low score, and if you make the search explore enough to
figure out that it's a good move (by finding several correct
continuation moves) in a practical amount of time, the score will be
horribly polluted in the mean time.

The solution has to be disassociating how much time you spend
exploring a move and how much it contributes to the score of its
parent. I feel that UCT is great for making up a score out of repeated
simulations, but eventually we should end up thinking of it as an
evaluation function and using something much closer to minimax for the
parts of the tree close to the root. Unfortunately, I don't have any
successful experiments to back out this feeling.
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Re: [Computer-go] some UCT notes

Reply via email to