Re: [computer-go] producing a good probability distribution over legal moves

Zach Keatts Thu, 17 May 2007 09:04:55 -0700

What you would have after your training/evaluator phase is a hueristic
knowlege of possibly better montecarlo trees to consider.  This will
definitely cut down on the search space, but could also alienate a strong
search path.  I have been thinking along these same line for some time.  The
problem then lies in where you can decide what trees would be worth looking
at initially.  What about a database of professional games?  Take the
"winning" games as examples of strong searches that ended in a win.  The
problem is even more complex because where in the "winning tree" do you tell
montecarlo to start searching?  Will you assign a higher probability to each
move in those games (defining a known probabilistically stronger predicted
result)?


That is one approach.  The other approach is the purely simulated approach
where you run simulations and gradually allow your probability function to
evolve based on the results.  Although this is a purer approach I think the
aforementioned strategy may yield some useful experimentation.  The strong
point is that it will take advantage of the human brain's innate pattern
recognition and calculation skills.  Since we have recorded games we have
plenty of examples of this "thought process".  For a 9Dan winning game,
those trees surely are worth investigating...


On 5/16/07, George Dahl <[EMAIL PROTECTED]> wrote:


I find Monte-Carlo Go a fascinating avenue of research, but what pains
me is that a huge number of simulations are performed each game and at
the end of the game the results are thrown out.  So what I was
thinking is that perhaps the knowledge generated by the simulations
could be collapsed in some way.

Suppose that epsilon greedy versions of a reasonably strong MC Go
program were to play a very large number of games against themselves.
By epsilon greedy versions I mean that with probability epsilon a
random move is played and with probability 1- epsilon the move the MC
Player would normally play is played.  Each position in these games
would be stored along with the Monte Calro/UCT evaluation for that
position's desirability.  This would produce an arbitrarily large
database of position/score pairs.  At this point a general function
approximator / learning algorithm (such as a neural network) could be
trained to map positions to scores.  If this was successful, it would
produce something that could very quickly (even a large neural net
evaluation or what have you would be much faster than doing a large
number of MC playouts) map positions to scores.  Obviously the scores
would not be perfect since the monte carlo program did not play
anywhere near perfect Go.  But this static evaluator could then be
plugged back into the monte carlo player and used to bias the random
playouts.  Wouldn't it be useful to be able to quickly estimate the MC
score without doing any playouts?

Clearly this idea could be extended recursively with a lot of offline
training.  What makes this formulation more valuable is that given
enough time and effort someone familiar with machine learning should
be able to produce a learning architecture that can actually learn the
MC scores.  It would be a straightforward, if potentially quite
difficult, supervised learning task with effectively unlimited data
since more could be generated at will.  Such a learning architecture
could be used in the manner I described above or thrown at the more
general reinforcement learning problem.


Does anyone have any thoughts on this idea?  Does anyone know of it
being tried before?

- George
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] producing a good probability distribution over legal moves

Reply via email to