I copied this from another post someone made: Here is a summary of how it works: - Use probability of winning as score, not territory - Use the average outcome as position value - Select the move that maximizes v + sqrt((2*log(t))/(10*n))
v is the value of the move (average outcome, between 0 and 1), n the number of simulations of this move, and t the total number of simulations at the current position. In case a move has n = 0, it is selected first. Is the formula confusing you? I will try to break it down. You have statistics on every node of the game tree. Think of the above formula as an algorithm for determining which node in the game tree to follow. His terminology was a bit confusing but when he says, "the move/this move" and "current" position he means the current node and some child node under consideration. Apply the formula to every child node and choose the one with the highest value. If one or more of the child nodes has not been visited, pick one of them arbitrarily. I use a completely different formula in my program for selecting a move, but I'm testing this formula now. - Don On Wed, 2007-01-17 at 17:33 +0000, Jacques BasaldĂșa wrote: > Hi, Don > > Don Dailey wrote: > > > v + sqrt((2*log(t))/(10*n)) .. > > .. n the number of simulations of this move > > 1. Does that mean the number in any branch? > Do you store an array with the number of times > each move is played, no matter in what branch? > > 2. Do you have some explanation for this expression? > > Thank you for sharing you results. I am very interested > in memory issues of UCT. > > Jacques. > > > _______________________________________________ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/