On Wed, Aug 13, 2008 at 5:00 PM, Gian-Carlo Pascutto <[EMAIL PROTECTED]> wrote:
> The problem is that the optimal settings for UCT appear to be much stronger
> on the exploitation side than on the exploration side, making it much more
> likely that such work is really wasted.
I'm not sure it's that clear. In a node where one move is the clear
favorite, and exploitation repeatedly selects it, then the selection
of this move would be the same even with slightly outdated
information. But in a node with many equal moves, it's more likely
that new information will make the best move (by UCT) change. IMO it's
pretty hard to waste work in UCT, as each playout adds some
information. The question is, how much, and the UCB part of UCT is
there to maximize the information we do get.

Of course I'm talking about a shared memory model, where the
information available to any processor is equal, but might be outdated
by at most a few playouts. If you have a MoGo-style distributed model,
you are indeed correct. I can see bad moves high up in the tree being
explored many times by different processors. I think their distributed
model has a lot of room for improvement (but it is of course quite an
achievement to get a big improvement on such hardware at all from my
perspective).
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to