> How do you handle n_i = 0 in equation 1.2? 

I don't compute it when n_i is not equal to 1. At that point, I give every node 
a high upper bound in order to visit every node once. Remember that I am using 
"progressive unpruning" in the same time, so only the best nodes according to 
the heuristic are visited.

> I'd assume that if a heuristic says a move is really bad (H_B = 0), then 
> you'd want to avoid simulations for a while.

Yes! This is exactly the purpose of "progressive unprunning".

> p_hat = (w_i + n_h*H_B)/(n_i+n_h)

Interesting... But then how do you compute n_h in practice

If I understood Remi correctly, he computes the value by adding one virtual 
loss and one virtual win for each gamma, ie, with your notation: p_hat= 
(w_i+gamma)/(gamma*2+n_i)

This formula tends to center the probability distribution on 0.5. But if a 
pattern is good, I would rather shift the distribution towards 1! Adding a term 
in gamma/n_i as I do in my paper is a solution to shift the probability 
distribution to towards 1 which works well in practice.

For the full version of my paper I will compare different ways to modify the 
probability distribution according to knowledge. 
I believe there is no optimal way to do that :(

This problem is fundamental, because it gives the opportunity to include 
previously learned knowledge into Monte-Carlo Tree Search. 


-----Original Message-----
From: [EMAIL PROTECTED] on behalf of Jason House
Sent: Thu 17/05/2007 22:26
To: computer-go
Subject: Re: [computer-go] JCIS extended abstract
 
How do you handle n_i = 0 in equation 1.2?  I'd assume that if a heuristic
says a move is really bad (H_B = 0), then you'd want to avoid simulations
for a while.

Also, has mango experimented with other related strategies for a soft
transition?

In my own mathematical recreation, I came up with a slightly variant method.
Just to keep the text short, here's the notation that I'll use...
  N = number of simulations through parent node
  n_i = number of simulations through this node
  w_i = winning simulations through this node
  v_i = w_i / n_i
  H_B = heuristic_bias
  n_h = heuristic confidence (used in my equation)
  UCTdelta = sqrt(ln(N)/n_i)
  p_hat = estimated probability of winning with soft transition

... then equation 1.2 is p_hat + UCTdelta

For you,
  p_hat = (w_i + H_B)/n_i
I derived and posted the following formula to the list:
  p_hat = (w_i + n_h*H_B)/(n_i+n_h)

The two equations are fairly similar... especially if n_h is set to 1.  Then
the only difference is that I replace n_i with (n_i+1)... and allows p_hat
to be calculated with n_i = 0.  I made no attempt to handle UCTdelta, but
replacing n_i in there with n_i+n_h is probably reasonable

On 5/17/07, Chaslot G (MICC) <[EMAIL PROTECTED]> wrote:
>
>  Dear all,
>
>
> Following the example of RĂ©mi, I would like to share with you a paper that
> I wrote which describe some important elements of my Go program Mango.
>
> I submitted this paper recently to the JCIS workshop 2007. Due to the fact
> that it was an extended abstract, a lot of details are missing. I am now
> working on the full paper, which will contain much more information.
>
>
>
> The extended abstract can be found here:
> www.cs.unimaas.nl/g.chaslot/papers/pMCTS.pdf
>
>
>
> Cheers,
>
>
>
> Guillaume
>
>
>
> PS: To answer's Sylvain question: Modifying the probability distribution
> of moves in the tree was more effective in Mango than the improvement done
> on the simulation strategy. However, Mango simulation strategy is not as
> good as CrazyStone's or Mogo's one. So I guess Mango won against Mogo in the
> last KGS tournament because it had a better way to use domain knowledge in
> the Tree.
>
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>

<<winmail.dat>>

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to