On 5/18/07, Rémi Coulom <[EMAIL PROTECTED]> wrote:
David Silver wrote:
> Very interesting paper!
>
> I have one question. The assumption in your paper is that increasing
> the performance of the simulation player will increase the performance
> of Monte-Carlo methods that use that simulation player. However, we
> found in MoGo that this is not necessarily the case! Do you think
> there is some property of your learning algorithm that makes it
> particularly suitable for Monte-Carlo methods?
>
> Thanks!
> Dave
Maximizing the likelihood does not optimize the performance of the
simulation player. For instance, by making it more greedy, I am sure it
would become a stronger player. I have the feeling that maximizing the
likelihood produces a good balance between playing good moves and being
random. It would be worth testing the strength of the MC player with
more or less greedy versions of the random player to test this.
Rémi
My own take on this is that we don't really know what the right
probability distribution is to get MC simulations that predict well
the outcome of the game. What we can do is provide indications of what
properties these moves have and combine them with weights that will be
tuned by playing many games. Rémi's way of computing a probability
distribution is probably an acceptable way to pick, but it's perfectly
possible that using some power of his strengths would actually give a
stronger program.
In our case, we are currently trying to estimate a score for each
move, which roughly corresponds to how many points we think the move
is worth. Then we'll make the probability of each move proportional to
exp(A*score), where A is a number to be tuned by playing many games.
One could also use other "features" by making the probability
proportional to exp(A*score+B*is_capture+C*is_atari+...) but one would
have to tune A, B, C, ... by playing even more games. I guess we could
use the maximum-likelihood settings (what Rémi did) as an initial
guess, and then try our hand at this difficult optimization problem by
trying perturbations of those settings. There is a reason why just
bought a powerful computer and we are going to buy more. :)
Álvaro.
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/