Why does this pose a problem? Presumably the monte carlo evaluator will give the same position a similar score assuming it has enough time. This would just cause a duplicate training pattern, or two training patterns with identical input and slightly different output. I guess I don't quite understand what the issue would be.
- George On 5/17/07, Daniel Burgos <[EMAIL PROTECTED]> wrote:
But it is very difficult that a board position is repeated between games. I don't see how you will use the training pairs in the new games. 2007/5/17, George Dahl < [EMAIL PROTECTED]>: > > What I am actually proposing is collapsing the results of the playouts > offline and then having a function that maps board positions to > playout values without actually doing playouts. So I would use an MC > player to generate a lot of training pairs of the form (position, > score) where position would be some sort of vector representation of > the board position and score would be a single scalar value that > corresponded to the value the Monte Carlo program decided after many > simulations that the position had. So I would create something that > learned a value function for positions. Then, once training was > complete, this function could be evaluated very quickly and hopefully > give the same results that, say, 100k playouts would give. But since > it was so fast, the program could use the extra time to actually do > more playouts. The difference would be that the new playouts would be > biased by the value function. So the probability of picking a move > would be proportional to the value function evaluated on the resulting > position. This way I would be bootstrapping a general global position > evaluator and using it to improve monte carlo simulations. > > Imagine if you had a monte carlo program that took almost no time to > run. You would use it to do "heavy" playouts for another monte carlo > program to make it even stronger. > > The reason this might be easier than learning from a database of > professional games is that it is easy to generate scalar scores of > positions with a monte carlo program. Presumably it is also easier to > learn how to play like a mediocre monte carlo program than like a > professional player. > > - George > > > > On 5/17/07, Zach Keatts <[EMAIL PROTECTED]> wrote: > > What you would have after your training/evaluator phase is a hueristic > > knowlege of possibly better montecarlo trees to consider. This will > > definitely cut down on the search space, but could also alienate a strong > > search path. I have been thinking along these same line for some time. The > > problem then lies in where you can decide what trees would be worth looking > > at initially. What about a database of professional games? Take the > > "winning" games as examples of strong searches that ended in a win. The > > problem is even more complex because where in the "winning tree" do you tell > > montecarlo to start searching? Will you assign a higher probability to each > > move in those games (defining a known probabilistically stronger predicted > > result)? > > > > That is one approach. The other approach is the purely simulated approach > > where you run simulations and gradually allow your probability function to > > evolve based on the results. Although this is a purer approach I think the > > aforementioned strategy may yield some useful experimentation. The strong > > point is that it will take advantage of the human brain's innate pattern > > recognition and calculation skills. Since we have recorded games we have > > plenty of examples of this "thought process". For a 9Dan winning game, > > those trees surely are worth investigating... > > > > > > On 5/16/07, George Dahl <[EMAIL PROTECTED]> wrote: > > > > > > I find Monte-Carlo Go a fascinating avenue of research, but what pains > > > me is that a huge number of simulations are performed each game and at > > > the end of the game the results are thrown out. So what I was > > > thinking is that perhaps the knowledge generated by the simulations > > > could be collapsed in some way. > > > > > > Suppose that epsilon greedy versions of a reasonably strong MC Go > > > program were to play a very large number of games against themselves. > > > By epsilon greedy versions I mean that with probability epsilon a > > > random move is played and with probability 1- epsilon the move the MC > > > Player would normally play is played. Each position in these games > > > would be stored along with the Monte Calro/UCT evaluation for that > > > position's desirability. This would produce an arbitrarily large > > > database of position/score pairs. At this point a general function > > > approximator / learning algorithm (such as a neural network) could be > > > trained to map positions to scores. If this was successful, it would > > > produce something that could very quickly (even a large neural net > > > evaluation or what have you would be much faster than doing a large > > > number of MC playouts) map positions to scores. Obviously the scores > > > would not be perfect since the monte carlo program did not play > > > anywhere near perfect Go. But this static evaluator could then be > > > plugged back into the monte carlo player and used to bias the random > > > playouts. Wouldn't it be useful to be able to quickly estimate the MC > > > score without doing any playouts? > > > > > > Clearly this idea could be extended recursively with a lot of offline > > > training. What makes this formulation more valuable is that given > > > enough time and effort someone familiar with machine learning should > > > be able to produce a learning architecture that can actually learn the > > > MC scores. It would be a straightforward, if potentially quite > > > difficult, supervised learning task with effectively unlimited data > > > since more could be generated at will. Such a learning architecture > > > could be used in the manner I described above or thrown at the more > > > general reinforcement learning problem. > > > > > > > > > Does anyone have any thoughts on this idea? Does anyone know of it > > > being tried before? > > > > > > - George > > > _______________________________________________ > > > computer-go mailing list > > > computer-go@computer-go.org > > > http://www.computer-go.org/mailman/listinfo/computer-go/ > > > > > > > > > _______________________________________________ > > computer-go mailing list > > computer-go@computer-go.org > > http://www.computer-go.org/mailman/listinfo/computer-go/ > > > _______________________________________________ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/