On 5/11/07, Jason House <[EMAIL PROTECTED]> wrote:

Consider a node who's had one child extensively evaluated through
transpositions.  When UCT does come to visit it, the sqrt(sum(child
simulations)) will be very high.  That will artificially favor exploration
of children with less simulations.  That will have a continuing negative
impact on the exploitation.

... but then once the parent gets updated with the sum(child simulations),
UCT may simply stop searching that branch for a long time... Because it
likely had a low average win rate, and now looks to be extensively explored.
 When UCT eventually does come back to it, the problem in the previous
paragraph will still apply.

So the issue is that if UCT is given, a priori, a tree that's already
been explored (say by some other algorithm) in a highly imbalanced
order, it isn't able to rebalance?  It seems like there ought to be a
way to make it use whatever information it's been given in a smarter
way.  After all, knowing that someone else explored one of the nodes
1000 times does mean something.

Another issue (or maybe another way of putting it): a parent may get a
large number of "virtual simulations" for free via differing paths to
the same descendants, but these simulations can't be considered
independent when calculating probabilities and confidence.

On the other hand, if you don't have transposition tables, some paths
will be explored that happen to mirror each other and yet the
algorithm won't notice the correlation and consider them as
independent runs when calculating probabilities. It seems like knowing
about a correlation should make it possible to estimate the true
probability better than not knowing it.
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to