I believe we used a uniform random policy (only "don't play in your
own pseudoeyes").
The numbers probably won't be the same, but we've certainly replicated
the qualitative improvement with version 6.05 of Orego, available here:
https://webdisk.lclark.edu/drake/orego/
Peter Drake
http://ww
Peter,
I tried to reproduce this, so I gave this a whirl and the win rate
against UCB-Tuned1 with first move priority of 1.1 (like Mogo) was only
33%. That was using uniform random playouts.
What was the playout policy you used for this?
Christian
On 18/06/2009 21:04, Peter Drake wrote:
An
I thought about this a long time ago, but I thought it would only make
a difference when the number of simulations is very small, which
should probably be covered by heuristics, so I don't think the
refinement for the standard deviation will matter much in the end.
Even though the article is about
On Thu, Jun 18, 2009 at 6:43 PM, Michael
Williams wrote:
> Section 3.2 describes a pair of tests that took about 4.2 minutes each (if
> my calculations are correct). Why not play more games and have each game
> contain more simulations? Writing the code and the paper is the hard part,
> waiting f
Section 3.2 describes a pair of tests that took about 4.2 minutes each (if my calculations are correct). Why not play more games and have each game contain
more simulations? Writing the code and the paper is the hard part, waiting for a computer to run your code is easy.
Peter Drake wrote:
An