Why does this pose a problem? Presumably the monte carlo evaluator
will give the same position a similar score assuming it has enough
time. This would just cause a duplicate training pattern, or two
training patterns with identical input and slightly different output.
I guess I don't quite under
On Thu, 2007-05-17 at 12:17 -0400, George Dahl wrote:
> Imagine if you had a monte carlo program that took almost no time to
> run. You would use it to do "heavy" playouts for another monte carlo
> program to make it even stronger.
I tried something like this as a test with simple monte carlo.
On Thu, 2007-05-17 at 10:54 -0500, Zach Keatts wrote:
> What you would have after your training/evaluator phase is a hueristic
> knowlege of possibly better montecarlo trees to consider. This will
> definitely cut down on the search space, but could also alienate a
> strong search path. I have be
But it is very difficult that a board position is repeated between games. I
don't see how you will use the training pairs in the new games.
2007/5/17, George Dahl <[EMAIL PROTECTED]>:
What I am actually proposing is collapsing the results of the playouts
offline and then having a function that
What I am actually proposing is collapsing the results of the playouts
offline and then having a function that maps board positions to
playout values without actually doing playouts. So I would use an MC
player to generate a lot of training pairs of the form (position,
score) where position would
What you would have after your training/evaluator phase is a hueristic
knowlege of possibly better montecarlo trees to consider. This will
definitely cut down on the search space, but could also alienate a strong
search path. I have been thinking along these same line for some time. The
problem
On 17, May 2007, at 8:17 AM, Brian Slesinsky wrote:
A weakness of this approach is that sometimes the best move depends on
how you plan to follow it up; a program that plays the theoretically
best move but doesn't know how to follow it up is weaker than a
program that plays safer moves.
I have
I think there is something to this; it seems like it should be
possible to use a database of randomly selected positions from games
along with the best known followup, and use that as a faster way of
testing a program's strength than playing full games. Such a database
would be valuable for all s
I find Monte-Carlo Go a fascinating avenue of research, but what pains
me is that a huge number of simulations are performed each game and at
the end of the game the results are thrown out. So what I was
thinking is that perhaps the knowledge generated by the simulations
could be collapsed in som