On Mon, Oct 24, 2011 at 1:45 PM, Dave Dyer <[email protected]> wrote: > > I've been working with UCT search for other games than Go, and one > interesting thing I"ve learned is that the results can change dramatically > depending on how the UCT values are manipulated as the tree grows. > > Consider the root node; at the beginning of the search it's desirable to > sample all the children equally, to be sure each has a fair chance to be > noted as winning or losing. However, as the simulations continue, if this > egalitarian distribution continues, the simulations from losing nodes dilutes > the results (as well as wasting time), so it's necessary to start > concentrating on the winning nodes. The exact method of transitioning from > broad to narrow focus can have dramatic effect on the results.
Doesn't the UCB formula basically encode this behavior? What I think I learned about UCT from experimenting with dimwit is that, for nodes other than the root, you need to reduce exploration so scores are not too polluted by bad moves, but then the principal variation gets ridiculously deep, which means that more exploration is needed. At the root you can make the search explore more, since you don't need to back out a score. I don't know if go has an equivalent to queen sacrifices in chess, but it would be very hard to make a UCT program that plays something like that correctly: The queen sacrifice would look like a horrible move, with really low score, and if you make the search explore enough to figure out that it's a good move (by finding several correct continuation moves) in a practical amount of time, the score will be horribly polluted in the mean time. The solution has to be disassociating how much time you spend exploring a move and how much it contributes to the score of its parent. I feel that UCT is great for making up a score out of repeated simulations, but eventually we should end up thinking of it as an evaluation function and using something much closer to minimax for the parts of the tree close to the root. Unfortunately, I don't have any successful experiments to back out this feeling. _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
