Hello,

Is there any known (by theory or tests) function of how much a increase
in the strength of the simulation policy increases the strength of the
MC/UCT Program as a whole?

I think that is a very interesting question.
In our work on MoGo we found that there could be a decrease of the
strength of the MC/UCT program while using a stronger simulation
policy. It is why in MoGo it is more the "sequence idea", than the
"strength idea". Our best simulation policy is quite weak compared to
others we tested.
But we have further experiments, in a work with David Silver from the
university of Alberta. We found out that the relation "strong
simulation policy" <=> "strong MC program" is wrong at a much larger
scale. So the "intransivity" is true even with much much stronger
simulation policies.

Of course there is the simple counter example of a deterministic
player. But our results  hold even if we randomise (in a lot of
manners, and tuning as best as we can the parameters) the much
stronger policy.

I have some theory about this phenomenon in general, but not enough
"polished" for the moment. I really think that understanding deeply
this experimental evidence, deeper than some intuition, would help
going further.
But maybe some already did.

Sylvain
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to