> How do you handle n_i = 0 in equation 1.2? I don't compute it when n_i is not equal to 1. At that point, I give every node a high upper bound in order to visit every node once. Remember that I am using "progressive unpruning" in the same time, so only the best nodes according to the heuristic are visited.
> I'd assume that if a heuristic says a move is really bad (H_B = 0), then > you'd want to avoid simulations for a while. Yes! This is exactly the purpose of "progressive unprunning". > p_hat = (w_i + n_h*H_B)/(n_i+n_h) Interesting... But then how do you compute n_h in practice If I understood Remi correctly, he computes the value by adding one virtual loss and one virtual win for each gamma, ie, with your notation: p_hat= (w_i+gamma)/(gamma*2+n_i) This formula tends to center the probability distribution on 0.5. But if a pattern is good, I would rather shift the distribution towards 1! Adding a term in gamma/n_i as I do in my paper is a solution to shift the probability distribution to towards 1 which works well in practice. For the full version of my paper I will compare different ways to modify the probability distribution according to knowledge. I believe there is no optimal way to do that :( This problem is fundamental, because it gives the opportunity to include previously learned knowledge into Monte-Carlo Tree Search. -----Original Message----- From: [EMAIL PROTECTED] on behalf of Jason House Sent: Thu 17/05/2007 22:26 To: computer-go Subject: Re: [computer-go] JCIS extended abstract How do you handle n_i = 0 in equation 1.2? I'd assume that if a heuristic says a move is really bad (H_B = 0), then you'd want to avoid simulations for a while. Also, has mango experimented with other related strategies for a soft transition? In my own mathematical recreation, I came up with a slightly variant method. Just to keep the text short, here's the notation that I'll use... N = number of simulations through parent node n_i = number of simulations through this node w_i = winning simulations through this node v_i = w_i / n_i H_B = heuristic_bias n_h = heuristic confidence (used in my equation) UCTdelta = sqrt(ln(N)/n_i) p_hat = estimated probability of winning with soft transition ... then equation 1.2 is p_hat + UCTdelta For you, p_hat = (w_i + H_B)/n_i I derived and posted the following formula to the list: p_hat = (w_i + n_h*H_B)/(n_i+n_h) The two equations are fairly similar... especially if n_h is set to 1. Then the only difference is that I replace n_i with (n_i+1)... and allows p_hat to be calculated with n_i = 0. I made no attempt to handle UCTdelta, but replacing n_i in there with n_i+n_h is probably reasonable On 5/17/07, Chaslot G (MICC) <[EMAIL PROTECTED]> wrote: > > Dear all, > > > Following the example of RĂ©mi, I would like to share with you a paper that > I wrote which describe some important elements of my Go program Mango. > > I submitted this paper recently to the JCIS workshop 2007. Due to the fact > that it was an extended abstract, a lot of details are missing. I am now > working on the full paper, which will contain much more information. > > > > The extended abstract can be found here: > www.cs.unimaas.nl/g.chaslot/papers/pMCTS.pdf > > > > Cheers, > > > > Guillaume > > > > PS: To answer's Sylvain question: Modifying the probability distribution > of moves in the tree was more effective in Mango than the improvement done > on the simulation strategy. However, Mango simulation strategy is not as > good as CrazyStone's or Mogo's one. So I guess Mango won against Mogo in the > last KGS tournament because it had a better way to use domain knowledge in > the Tree. > > _______________________________________________ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ >
<<winmail.dat>>
_______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/