Re: [computer-go] Monte-Carlo Tree Search reference bot

Mark Boon Fri, 28 Nov 2008 03:17:55 -0800

Denis,

Thanks for the feed-back.

What you describe is pretty much what I do, with a small differencethat from A I don't assign any real scoring for the first move unlessA has had the minimum number of playouts before it expands further.So assume that that limit is 10 playouts. For the first 10 playoutsall the virtual-scores are distributed among A's children. Then itchooses the child with the best AMAF score (let's call that B),creates its children nodes, and does a playout. Then the virtualscore is distributed over the child-nodes of B. And over the childnodes of A. I've never seen mentioned distributing the AMAF values upthe tree, so maybe it's not necessary. I didn't see much differencewhether I did or didn't.

The idea is not to waste any AMAF information gathered. So it actsexactly like Don's ref-bot if it never expands. Also, if it expandsafter each playout it naturally always awards the real score to thefirst move, making this approach the same to what you described again.

It's quite possible the devil is in the details. At one time I ran atest where I did see the expected gain of AMAF. And then I didn'tmanage to reproduce it. So I don't know if I accidentally ran thewrong version or what. Usually when I run into an anomaly that Ican't reproduce I rely on TimeMachine to go back to what my computerwas like when I started the test . But incidentally, just whenrunning that test I had unplugged the disk to take elsewhere.Murphy's law I guess.

I did expect to end up with two different versions. But I alreadyhave a different ref-bot implementation and I have a different tree-search implementation. Having those two anchors I can freelyexperiment with the current one. If that leads me to split upversions further I'll do that, but I haven't found reason to do soyet. I'll also try weighting the real simulations more like you did.I just didn't have the time yet. Hopefully I can do a lot moreexperimenting next week.


Mark


On 28-nov-08, at 06:32, Denis fidaali wrote:

 My own experiments showed a very big increase in strength
especially with about 1000 light-playouts. I had many features
and parameters witch may make a difference. But my
main hypothesis was that it shouldn't ...
I think that i weighted real simulations (aka first move) 10 timesmore than
virtual one (aka AMAF moves).

I assume that by expansions you mean allowing data for every child
of a node. So let's assume that your root node is called A, you would
then allocate for every of it's child. Then when make up a simulation
running from A, you would first apply the UCT formula for
determining wich first move you should make ... then
 you would both get back the AMAF score for every moves
that has been played during the simulation for virtual scoring
AND you would use the true first move for Real scoring (normal UCTpart)
Of course, this shouldn't behave exactly like AMAF, because for onething
you can get a better judgment for the first move because of UCT, and
then you also make the first move weight more. (although you probably
technically should also do so with AMAF despite that the first move
is still 100% random. Which is part of what the
weighted AMAF does. But first move weight would be arbitrary
thus defying the purpose of a standard way)

Last, you could try to downgrade the weight of the virtual score part
as the number of simulation grows.

like with a (if nbsim < 1000)
winRatio=[realRatio *(1000+nbsim)]/3000+[virtualRatio*(2000-(1000+nbsim))]/3000
I'm not sure about the formula :) but you get the idea of takingmore and more into accountthe realRatio. I think that what i did was that i didn't take therealRatio into accout at all atfirst for estimating the winRatio, and then increased it linearlyto the point where i wouldn't
take into accout the virtualRatio.
The drawback with all that, is that you will certainly have to makeseveral versions, althoughthey would share a lot of code. I don't think that trying to getOne version behave asseveral is a good idea. You probably should keep the code safe,once you have made a lotof testing of it. For reference purpose. Then you'll be able toexactly know the strengthof a particular piece of software, even in two years from now. Idin't do that. I triedto make it evolve. And with poor svn versionning capacity, and poormanagement ofthe measures/software link : i'm now to a point where all i can dois make randomguesses, although i made millions of experiments in the last pastyears :) I just
can't link those experiments back to a fixed piece of software.
Then, if by any means, you think a code is bug free. Which is hardto bet on,you really should put that somewhere, with all the proof (andexperiments)you can associate with it. (Even on a web page if you ask me , or athesis)
To me that is the most usefull part of the standard ref bot :
because a lot of persons have tried it, you can be sure of theexact linkbetween the software, and the experimental results. It makesassertions
reproducibles, and that's really great.


> From: [EMAIL PROTECTED]
> Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot
> Date: Fri, 28 Nov 2008 01:48:09 -0200
> To: computer-go@computer-go.org
> CC:
>
>
> On 27-nov-08, at 19:50, Denis fidaali wrote:
>
> > So, you use AMAF for "simulating" the first UCT evaluations ?
> > I though the classical way to use AMAF, was to affect only the
> > win/lose ratio portion of the uct equation.
> > Obvioulsy it should be allowed to use an arbitrary
> > large number of AMAF simulation accumulating them longer
> > than what it take to expand a node.
> > I think that a classical way to affect the win/ratio is to
> > decrease the effect of the AMAF correction as the number
> > of simulation grows.
> >
> > If you test with a very low number of simulation
> > (in the 1000 - 3000 range), i think you should be
> > able to get out a very nice improvement out of the
> > AMAF version. If you don't, i would think that something
> > is wrong somewhere.
> >
> > What test process do you use for this version ?
>
> I tested it mostly doing 2,000 playouts. When AMAF is true Icreate a
> map of virtual-win values of all the moves played during a playout.
> These values get accumulated over all the playouts (not just the
> first ones). The 'virtual-value' of a move is calculated as follows:
>
> exploration-factor * UCT + ( (nr-wins*2 + nr-virtual-wins) / (nr-
> playouts*2 + nr-virtual-playouts))
>
> where the exploration-factor is currently sqrt(0.2) and UCT is sqrt
> ( log( nr-parent-playouts ) / ( nr-playouts+1) )
>
> Like I said, I haven't had time to experiment much so this formula
> may not be any good. I had also expected to see some positive effect
> of the virtual-win / virtual-playout ratio from AMAF, but I seenone.
> Of course it's also possible I have a different kind of bug still.
>
> What happens in my 'formula' is that when it never expands beyondthe
> first level, which is what happens if the number of simulations is
> equal to the number of simulations before expansion, the virtual-
> value becomes completely determined by nr-virtual-wins / nr-virtual-
> playouts making it equivalent to the original ref-bot. In case it
> does expand further and creates a tree, the actual win-loss ratio is
> weighed twice as heavily as the virtual win-loss ratio. That seemed
> like a reasonable first try. I have tried a few others, but usually
> didn't get much different results or much worse results.
>
> Mark
>
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/

Souhaitez vous  « être au bureau sans y être » ? Oui je le veux !
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo Tree Search reference bot

Reply via email to