I copied this from another post someone made:

 Here is a summary of how it works:
  - Use probability of winning as score, not territory
  - Use the average outcome as position value
  - Select the move that maximizes v + sqrt((2*log(t))/(10*n))

  v is the value of the move (average outcome, between 0 and 1), n the
  number of simulations of this move, and t the total number of
  simulations at the current position. In case a move has n = 0, it is
  selected first.

Is the formula confusing you?   I will try to break it down.

You have statistics on every node of the game tree.  Think of the
above formula as an algorithm for determining which node in the
game tree to follow.    His terminology was a bit confusing but
when he says, "the move/this move" and "current" position he means the
current node and some child node under consideration.   Apply
the formula to every child node and choose the one with the
highest value.    If one or more of the child nodes has not
been visited,  pick one of them arbitrarily. 

I use a completely different formula in my program for selecting
a move, but I'm testing this formula now.

- Don




On Wed, 2007-01-17 at 17:33 +0000, Jacques BasaldĂșa wrote:
> Hi, Don
> 
> Don Dailey wrote:
> 
>  > v + sqrt((2*log(t))/(10*n)) ..
>  > .. n the number of simulations of this move
> 
> 1. Does that mean the number in any branch?
> Do you store an array with the number of times
> each move is played, no matter in what branch?
> 
> 2. Do you have some explanation for this expression?
> 
> Thank you for sharing you results. I am very interested
> in memory issues of UCT.
> 
> Jacques.
> 
> 
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to