On 5/11/07, Jason House <[EMAIL PROTECTED]> wrote:
Consider a node who's had one child extensively evaluated through transpositions. When UCT does come to visit it, the sqrt(sum(child simulations)) will be very high. That will artificially favor exploration of children with less simulations. That will have a continuing negative impact on the exploitation. ... but then once the parent gets updated with the sum(child simulations), UCT may simply stop searching that branch for a long time... Because it likely had a low average win rate, and now looks to be extensively explored. When UCT eventually does come back to it, the problem in the previous paragraph will still apply.
So the issue is that if UCT is given, a priori, a tree that's already been explored (say by some other algorithm) in a highly imbalanced order, it isn't able to rebalance? It seems like there ought to be a way to make it use whatever information it's been given in a smarter way. After all, knowing that someone else explored one of the nodes 1000 times does mean something. Another issue (or maybe another way of putting it): a parent may get a large number of "virtual simulations" for free via differing paths to the same descendants, but these simulations can't be considered independent when calculating probabilities and confidence. On the other hand, if you don't have transposition tables, some paths will be explored that happen to mirror each other and yet the algorithm won't notice the correlation and consider them as independent runs when calculating probabilities. It seems like knowing about a correlation should make it possible to estimate the true probability better than not knowing it. _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/