My own experiments showed a very big increase in strength
especially with about 1000 light-playouts. I had many features
and parameters witch may make a difference. But my
main hypothesis was that it shouldn't ...
I think that i weighted real simulations (aka first move) 10 times more than
virtual one (aka AMAF moves).
I assume that by expansions you mean allowing data for every child
of a node. So let's assume that your root node is called A, you would
then allocate for every of it's child. Then when make up a simulation
running from A, you would first apply the UCT formula for
determining wich first move you should make ... then
you would both get back the AMAF score for every moves
that has been played during the simulation for virtual scoring
AND you would use the true first move for Real scoring (normal UCT part)
Of course, this shouldn't behave exactly like AMAF, because for one thing
you can get a better judgment for the first move because of UCT, and
then you also make the first move weight more. (although you probably
technically should also do so with AMAF despite that the first move
is still 100% random. Which is part of what the
weighted AMAF does. But first move weight would be arbitrary
thus defying the purpose of a standard way)
Last, you could try to downgrade the weight of the virtual score part
as the number of simulation grows.
like with a (if nbsim < 1000)
winRatio=[realRatio *(1000+nbsim)]/3000+[virtualRatio*(2000-(1000+nbsim))]/3000
I'm not sure about the formula :) but you get the idea of taking more and more
into account
the realRatio. I think that what i did was that i didn't take the realRatio
into accout at all at
first for estimating the winRatio, and then increased it linearly to the point
where i wouldn't
take into accout the virtualRatio.
The drawback with all that, is that you will certainly have to make several
versions, although
they would share a lot of code. I don't think that trying to get One version
behave as
several is a good idea. You probably should keep the code safe, once you have
made a lot
of testing of it. For reference purpose. Then you'll be able to exactly know
the strength
of a particular piece of software, even in two years from now. I din't do that.
I tried
to make it evolve. And with poor svn versionning capacity, and poor management
of
the measures/software link : i'm now to a point where all i can do is make
random
guesses, although i made millions of experiments in the last past years :) I
just
can't link those experiments back to a fixed piece of software.
Then, if by any means, you think a code is bug free. Which is hard to bet on,
you really should put that somewhere, with all the proof (and experiments)
you can associate with it. (Even on a web page if you ask me , or a thesis)
To me that is the most usefull part of the standard ref bot :
because a lot of persons have tried it, you can be sure of the exact link
between the software, and the experimental results. It makes assertions
reproducibles, and that's really great.
> From: [EMAIL PROTECTED]
> Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot
> Date: Fri, 28 Nov 2008 01:48:09 -0200
> To: computer-go@computer-go.org
> CC:
>
>
> On 27-nov-08, at 19:50, Denis fidaali wrote:
>
> > So, you use AMAF for "simulating" the first UCT evaluations ?
> > I though the classical way to use AMAF, was to affect only the
> > win/lose ratio portion of the uct equation.
> > Obvioulsy it should be allowed to use an arbitrary
> > large number of AMAF simulation accumulating them longer
> > than what it take to expand a node.
> > I think that a classical way to affect the win/ratio is to
> > decrease the effect of the AMAF correction as the number
> > of simulation grows.
> >
> > If you test with a very low number of simulation
> > (in the 1000 - 3000 range), i think you should be
> > able to get out a very nice improvement out of the
> > AMAF version. If you don't, i would think that something
> > is wrong somewhere.
> >
> > What test process do you use for this version ?
>
> I tested it mostly doing 2,000 playouts. When AMAF is true I create a
> map of virtual-win values of all the moves played during a playout.
> These values get accumulated over all the playouts (not just the
> first ones). The 'virtual-value' of a move is calculated as follows:
>
> exploration-factor * UCT + ( (nr-wins*2 + nr-virtual-wins) / (nr-
> playouts*2 + nr-virtual-playouts))
>
> where the exploration-factor is currently sqrt(0.2) and UCT is sqrt
> ( log( nr-parent-playouts ) / ( nr-playouts+1) )
>
> Like I said, I haven't had time to experiment much so this formula
> may not be any good. I had also expected to see some positive effect
> of the virtual-win / virtual-playout ratio from AMAF, but I see none.
> Of course it's also possible I have a different kind of bug still.
>
> What happens in my 'formula' is that when it never expands beyond the
> first level, which is what happens if the number of simulations is
> equal to the number of simulations before expansion, the virtual-
> value becomes completely determined by nr-virtual-wins / nr-virtual-
> playouts making it equivalent to the original ref-bot. In case it
> does expand further and creates a tree, the actual win-loss ratio is
> weighed twice as heavily as the virtual win-loss ratio. That seemed
> like a reasonable first try. I have tried a few others, but usually
> didn't get much different results or much worse results.
>
> Mark
>
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
_________________________________________________________________
Inédit ! Des Emoticônes Déjantées! Installez les dans votre Messenger !
http://www.ilovemessenger.fr/Emoticones/EmoticonesDejantees.aspx
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/