Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread Rémi Coulom
David Silver wrote: 2. Run another N simulations, average the value of psi(s,a) over all positions and moves in games that black won (call this g) This is strange: you do not take lost playouts into consideration. I believe there is a problem with your estimation of the gradient. Suppo

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread Rémi Coulom
Rémi Coulom wrote: The fundamental problem here may be that your estimate of the gradient is biased by the playout policy. You should probably sample X(s) uniformly at random to have an unbiased estimator. Maybe this can be fixed with importance sampling, and then you may get a formula that is

[computer-go] Today's "Guardian"

2009-04-30 Thread Nick Wedd
Today's "Guardian" newspaper has, on the front page of its technology supplement, an article about recent developments in computer Go. You can read it at http://www.guardian.co.uk/technology/2009/apr/30/games-software-mogo Unlike the last newspaper article on computer Go that I saw, this one i

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread David Silver
Hi Remi, This is strange: you do not take lost playouts into consideration. I believe there is a problem with your estimation of the gradient. Suppose for instance that you count z = +1 for a win, and z = -1 for a loss. Then you would take lost playouts into consideration. This makes me a

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread Rémi Coulom
David Silver wrote: Sorry, I should have made it clear that this assumes that we are treating black wins as z=1 and white wins as z=0. In this special case, the gradient is the average of games in which black won. But yes, more generally you need to include games won by both sides. The algori

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread David Silver
Hi Remi, I understood this. What I find strange is that using -1/1 should be equivalent to using 0/1, but your algorithm behaves differently: it ignores lost games with 0/1, and uses them with -1/1. Imagine you add a big constant to z. One million, say. This does not change the problem. Y

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread Michael Williams
I wish I was smart :( David Silver wrote: Hi Remi, I understood this. What I find strange is that using -1/1 should be equivalent to using 0/1, but your algorithm behaves differently: it ignores lost games with 0/1, and uses them with -1/1. Imagine you add a big constant to z. One millio

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread David Silver
Hi Yamato, Thanks for the detailed explanation. M, N and alpha are constant numbers, right? What did you set them to? You're welcome! Yes, in our experiments they were just constant numbers M=N=100. The feature vector is the set of patterns you use, with value 1 if a pattern is matched and

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread David Silver
IMO other people's equations/code/ideas/papers always seem smarter than your own. The stuff you understand and do yourself just seems like common sense, and the stuff you don't always has a mystical air of complexity, at least until you understand it too :-) On 30-Apr-09, at 1:59 PM, Michae

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread Yamato
David Silver wrote: >Yes, in our experiments they were just constant numbers M=N=100. If M and N are the same, is there any reason to run M simulations and N simulations separately? What happens if you combine them and calculate V and g in the single loop? >Okay, let's continue the example above