On Wed, Aug 13, 2008 at 5:00 PM, Gian-Carlo Pascutto <[EMAIL PROTECTED]> wrote: > The problem is that the optimal settings for UCT appear to be much stronger > on the exploitation side than on the exploration side, making it much more > likely that such work is really wasted.
I'm not sure it's that clear. In a node where one move is the clear favorite, and exploitation repeatedly selects it, then the selection of this move would be the same even with slightly outdated information. But in a node with many equal moves, it's more likely that new information will make the best move (by UCT) change. IMO it's pretty hard to waste work in UCT, as each playout adds some information. The question is, how much, and the UCB part of UCT is there to maximize the information we do get. Of course I'm talking about a shared memory model, where the information available to any processor is equal, but might be outdated by at most a few playouts. If you have a MoGo-style distributed model, you are indeed correct. I can see bad moves high up in the tree being explored many times by different processors. I think their distributed model has a lot of room for improvement (but it is of course quite an achievement to get a big improvement on such hardware at all from my perspective). _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/