Re: [computer-go] Monte-Carlo Simulation Balancing

2009-08-13 Thread Olivier Teytaud
> > > Just to clarify: I was not saying that Mogo's policy consisted > *solely* of looking for patterns around the last move. Merely that > it does not look for patterns around *every* point, which other > playout policies (e.g., CrazyStone, if I understand Remi's papers > correctly) appear to do.

[computer-go] Monte-Carlo Simulation Balancing

2009-08-13 Thread Brian Sheppard
>> . Pebbles has a "Mogo" playout design, where you check >> for patterns only around the last move (or two). >> > >In MoGo, it's not only around the last move (at least with some probability >and when there are empty spaces in the board); this is the "fill board" >modification. Just to clarify: I

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-08-13 Thread Olivier Teytaud
> . Pebbles has a "Mogo" playout design, where you check > for patterns only around the last move (or two). > In MoGo, it's not only around the last move (at least with some probability and when there are empty spaces in the board); this is the "fill board" modification. (this provides a big impr

[computer-go] Monte-Carlo Simulation Balancing

2009-08-13 Thread Brian Sheppard
>Is anyone (besides the authors) doing research based on this? Well, Pebbles does apply reinforcement learning (RL) to improve its playout policy. But not in the manner described in that paper. There are practical obstacles to directly applying that paper. To directly apply that paper, you must h

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-08-13 Thread Jason House
A web search turned up a 2 page and an 8 page version. I read the short one. I agree that it's promising work that requires some follow- up research. Now that you've read it so many times, what excites you about it? Can you envision a way to scale it to larger patterns and boards on modern

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-08-13 Thread Michael Williams
In future papers they should avoid using a strong authority like Fuego for the training and instead force it to learn from a naive uniform random playout policy (with 100x or 1000x more playouts) and then build on that with an iterative approach (as was suggested in the paper). I also had anothe

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-08-13 Thread Isaac Deutsch
I admit I had trouble understanding the details of the paper. What I think is the biggest problem for applying this to bigger (up to 19x19) games is that you somehow need access to the "true" value of a move, i.e. it's a win or a loss. On the 5x5 board they used, this might be approximated

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-08-12 Thread Michael Williams
After about the 5th reading, I'm concluding that this is an excellent paper. Is anyone (besides the authors) doing research based on this? There is a lot to do. David Silver wrote: Hi everyone, Please find attached my ICML paper with Gerry Tesauro on automatically learning a simulation po

[computer-go] Monte-Carlo Simulation Balancing

2009-06-22 Thread Isaac Deutsch
Has anyone tried this algorithm improvement on bigger boards and can give us a result? Link to original message: http://computer-go.org/pipermail/computer-go/2009-April/018159.html Thanks, ibd > > So maybe I could get 600 more Elo points > > with your method. And even more on 19x19. > > I notice

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-05-04 Thread David Silver
Hi, We used alpha=0.1. There may well be a better setting of alpha, but this appeared to work nicely in our experiments. -Dave On 3-May-09, at 2:01 AM, elife wrote: Hi Dave, In your experiments what's the constant value alpha you set? Thanks. 2009/5/1 David Silver : Yes, in our experi

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-05-03 Thread elife
Hi Dave, In your experiments what's the constant value alpha you set? Thanks. 2009/5/1 David Silver : > Yes, in our experiments they were just constant numbers M=N=100. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-05-01 Thread David Silver
Hi Yamato, If M and N are the same, is there any reason to run M simulations and N simulations separately? What happens if you combine them and calculate V and g in the single loop? I think it gives the wrong answer to do it in a single loop. Note that the simulation outcomes z are used

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread Yamato
David Silver wrote: >Yes, in our experiments they were just constant numbers M=N=100. If M and N are the same, is there any reason to run M simulations and N simulations separately? What happens if you combine them and calculate V and g in the single loop? >Okay, let's continue the example above

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread David Silver
IMO other people's equations/code/ideas/papers always seem smarter than your own. The stuff you understand and do yourself just seems like common sense, and the stuff you don't always has a mystical air of complexity, at least until you understand it too :-) On 30-Apr-09, at 1:59 PM, Michae

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread David Silver
Hi Yamato, Thanks for the detailed explanation. M, N and alpha are constant numbers, right? What did you set them to? You're welcome! Yes, in our experiments they were just constant numbers M=N=100. The feature vector is the set of patterns you use, with value 1 if a pattern is matched and

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread Michael Williams
I wish I was smart :( David Silver wrote: Hi Remi, I understood this. What I find strange is that using -1/1 should be equivalent to using 0/1, but your algorithm behaves differently: it ignores lost games with 0/1, and uses them with -1/1. Imagine you add a big constant to z. One millio

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread David Silver
Hi Remi, I understood this. What I find strange is that using -1/1 should be equivalent to using 0/1, but your algorithm behaves differently: it ignores lost games with 0/1, and uses them with -1/1. Imagine you add a big constant to z. One million, say. This does not change the problem. Y

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread Rémi Coulom
David Silver wrote: Sorry, I should have made it clear that this assumes that we are treating black wins as z=1 and white wins as z=0. In this special case, the gradient is the average of games in which black won. But yes, more generally you need to include games won by both sides. The algori

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread David Silver
Hi Remi, This is strange: you do not take lost playouts into consideration. I believe there is a problem with your estimation of the gradient. Suppose for instance that you count z = +1 for a win, and z = -1 for a loss. Then you would take lost playouts into consideration. This makes me a

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread Rémi Coulom
Rémi Coulom wrote: The fundamental problem here may be that your estimate of the gradient is biased by the playout policy. You should probably sample X(s) uniformly at random to have an unbiased estimator. Maybe this can be fixed with importance sampling, and then you may get a formula that is

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread Rémi Coulom
David Silver wrote: 2. Run another N simulations, average the value of psi(s,a) over all positions and moves in games that black won (call this g) This is strange: you do not take lost playouts into consideration. I believe there is a problem with your estimation of the gradient. Suppo

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-29 Thread Yamato
David Silver wrote: >A: Estimate value V* of every position in a training set, using deep >rollouts. > >B: Repeat, for each position in the training set > 1. Run M simulations, estimate value of position (call this V) > 2. Run another N simulations, average the value of psi(s,a) over

[computer-go] Re: [computer go] Monte-Carlo Simulation Balancing

2009-04-29 Thread David Silver
Hi Remi, What komi did you use for 5x5 and 6x6 ? I used 7.5 komi for both board sizes. I find it strange that you get only 70 Elo points from supervised learning over uniform random. Don't you have any feature for atari extension ? This one alone should improve strength immensely (extend stri

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-29 Thread Michael Williams
David Silver wrote: Hi Michael, But one thing confuses me: You are using the value from Fuego's 10k simulations as an approximation of the actual value of the position. But isn't the actual value of the position either a win or a loss? On such small boards, can't you assume that Fuego is ab

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-29 Thread David Silver
Hi Yamato, Could you give us the source code which you used? Your algorithm is too complicated, so it would be very helpful if possible. Actually I think the source code would be much harder to understand! It is written inside RLGO, and makes use of a substantial existing framework that w

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-28 Thread Don Dailey
But I'm only trying to make a point, not pin the analogy down perfectly. Naturally the stronger the player, the more likely his moves will conform to the level of the top players. The basic principle is that the longer the contest, the more opportunities a strong player has to demonstrate his supe

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-28 Thread Yamato
David Silver wrote: >> because the previous approaches were not optimized for such a small >> boards. > >I'm not sure what you mean here? The supervised learning and >reinforcement learning approaches that we compared against are both >trained on the small boards, i.e. the pattern weights are

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-28 Thread terry mcintyre
ssociation of men who do violence to the rest of us." - Leo Tolstoy From: steve uurtamo To: computer-go Sent: Tuesday, April 28, 2009 5:09:20 PM Subject: Re: [computer-go] Monte-Carlo Simulation Balancing also, i'm not sure that a lot of most amate

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-28 Thread steve uurtamo
also, i'm not sure that a lot of most amateurs' moves are very good. the spectrum of bad moves is wide, it's just that it takes someone many stones stronger to severely punish small differences between good and nearly-good moves. among players of relatively similar strength, these differences wil

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-28 Thread Don Dailey
A simplistic model that helps explain this is golf. On a single hole, even a casual golfer has a realistic chance of out-golfing Tiger Woods. Tiger occasionally shoots a 1 over par on some hole and even weak amateurs occasionally par or even birdie a hole.It's not going to happen a lot, but

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-28 Thread Ivan Dubois
I noticed that, in general, changes in the playout policy have a much bigger impact on larger boards than on smaller boards. Rémi I think rating differences are emplified on larger boards. This is easy to see if you think about it this way : Somehow a 19x19 board is like 4 9x9 boards. Let u

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-28 Thread Rémi Coulom
David Silver wrote: I don't think the 200+ Elo improvement is so impressive I agree that it would be much more impressive to report positive results on larger boards. But perhaps it is already interesting that tuning the balance of the simulation policy can make a big difference on small boa

[computer-go] Monte-Carlo Simulation Balancing

2009-04-28 Thread David Silver
Hi Yamato, I like you idea, but why do you use only 5x5 and 6x6 Go? 1. Our second algorithm, two-ply simulation balancing, requires a training set of two-ply rollouts. Rolling out every position from a complete two-ply search is very expensive on larger board sizes, so we would probably

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-27 Thread Yamato
David Silver wrote: >Please find attached my ICML paper with Gerry Tesauro on automatically >learning a simulation policy for Monte-Carlo Go. Our preliminary >results show a 200+ Elo improvement over previous approaches, although >our experiments were restricted to simple Monte-Carlo search w

[computer-go] Monte-Carlo Simulation Balancing

2009-04-27 Thread David Silver
Hi Michael, But one thing confuses me: You are using the value from Fuego's 10k simulations as an approximation of the actual value of the position. But isn't the actual value of the position either a win or a loss? On such small boards, can't you assume that Fuego is able to correctly de

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-27 Thread Michael Williams
My favorite part: "One natural idea is to use the learned simulation policy in Monte-Carlo search, and generate new deep search values, in an iterative cycle." But one thing confuses me: You are using the value from Fuego's 10k simulations as an approximation of the actual value of the position.

[computer-go] Monte-Carlo Simulation Balancing

2009-04-27 Thread David Silver
Hi Remi, If I understand correctly, your method makes your program 250 Elo points stronger than my pattern-learning algorithm on 5x5 and 6x6, by just learning better weights. Yes, although this is just in a very simple MC setting. Also we did not compare directly to the algorithm you used

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-27 Thread Rémi Coulom
David Silver wrote: Hi everyone, Please find attached my ICML paper with Gerry Tesauro on automatically learning a simulation policy for Monte-Carlo Go. Our preliminary results show a 200+ Elo improvement over previous approaches, although our experiments were restricted to simple Monte-Carlo

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-27 Thread Michael Williams
Finally! I guess you can add this technique to your list, Lukasz. David Silver wrote: Hi everyone, Please find attached my ICML paper with Gerry Tesauro on automatically learning a simulation policy for Monte-Carlo Go. Our preliminary results show a 200+ Elo improvement over previous approa