Denis,

Thanks for the feed-back.

What you describe is pretty much what I do, with a small difference that from A I don't assign any real scoring for the first move unless A has had the minimum number of playouts before it expands further. So assume that that limit is 10 playouts. For the first 10 playouts all the virtual-scores are distributed among A's children. Then it chooses the child with the best AMAF score (let's call that B), creates its children nodes, and does a playout. Then the virtual score is distributed over the child-nodes of B. And over the child nodes of A. I've never seen mentioned distributing the AMAF values up the tree, so maybe it's not necessary. I didn't see much difference whether I did or didn't.

The idea is not to waste any AMAF information gathered. So it acts exactly like Don's ref-bot if it never expands. Also, if it expands after each playout it naturally always awards the real score to the first move, making this approach the same to what you described again.

It's quite possible the devil is in the details. At one time I ran a test where I did see the expected gain of AMAF. And then I didn't manage to reproduce it. So I don't know if I accidentally ran the wrong version or what. Usually when I run into an anomaly that I can't reproduce I rely on TimeMachine to go back to what my computer was like when I started the test . But incidentally, just when running that test I had unplugged the disk to take elsewhere. Murphy's law I guess.

I did expect to end up with two different versions. But I already have a different ref-bot implementation and I have a different tree- search implementation. Having those two anchors I can freely experiment with the current one. If that leads me to split up versions further I'll do that, but I haven't found reason to do so yet. I'll also try weighting the real simulations more like you did. I just didn't have the time yet. Hopefully I can do a lot more experimenting next week.

Mark


On 28-nov-08, at 06:32, Denis fidaali wrote:

 My own experiments showed a very big increase in strength
especially with about 1000 light-playouts. I had many features
and parameters witch may make a difference. But my
main hypothesis was that it shouldn't ...

I think that i weighted real simulations (aka first move) 10 times more than
virtual one (aka AMAF moves).

I assume that by expansions you mean allowing data for every child
of a node. So let's assume that your root node is called A, you would
then allocate for every of it's child. Then when make up a simulation
running from A, you would first apply the UCT formula for
determining wich first move you should make ... then
 you would both get back the AMAF score for every moves
that has been played during the simulation for virtual scoring
AND you would use the true first move for Real scoring (normal UCT part)

Of course, this shouldn't behave exactly like AMAF, because for one thing
you can get a better judgment for the first move because of UCT, and
then you also make the first move weight more. (although you probably
technically should also do so with AMAF despite that the first move
is still 100% random. Which is part of what the
weighted AMAF does. But first move weight would be arbitrary
thus defying the purpose of a standard way)

Last, you could try to downgrade the weight of the virtual score part
as the number of simulation grows.

like with a (if nbsim < 1000)
winRatio=[realRatio *(1000+nbsim)]/3000+[virtualRatio*(2000-(1000 +nbsim))]/3000

I'm not sure about the formula :) but you get the idea of taking more and more into account the realRatio. I think that what i did was that i didn't take the realRatio into accout at all at first for estimating the winRatio, and then increased it linearly to the point where i wouldn't
take into accout the virtualRatio.


The drawback with all that, is that you will certainly have to make several versions, although they would share a lot of code. I don't think that trying to get One version behave as several is a good idea. You probably should keep the code safe, once you have made a lot of testing of it. For reference purpose. Then you'll be able to exactly know the strength of a particular piece of software, even in two years from now. I din't do that. I tried to make it evolve. And with poor svn versionning capacity, and poor management of the measures/software link : i'm now to a point where all i can do is make random guesses, although i made millions of experiments in the last past years :) I just
can't link those experiments back to a fixed piece of software.

Then, if by any means, you think a code is bug free. Which is hard to bet on, you really should put that somewhere, with all the proof (and experiments) you can associate with it. (Even on a web page if you ask me , or a thesis)
To me that is the most usefull part of the standard ref bot :
because a lot of persons have tried it, you can be sure of the exact link between the software, and the experimental results. It makes assertions
reproducibles, and that's really great.


> From: [EMAIL PROTECTED]
> Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot
> Date: Fri, 28 Nov 2008 01:48:09 -0200
> To: computer-go@computer-go.org
> CC:
>
>
> On 27-nov-08, at 19:50, Denis fidaali wrote:
>
> > So, you use AMAF for "simulating" the first UCT evaluations ?
> > I though the classical way to use AMAF, was to affect only the
> > win/lose ratio portion of the uct equation.
> > Obvioulsy it should be allowed to use an arbitrary
> > large number of AMAF simulation accumulating them longer
> > than what it take to expand a node.
> > I think that a classical way to affect the win/ratio is to
> > decrease the effect of the AMAF correction as the number
> > of simulation grows.
> >
> > If you test with a very low number of simulation
> > (in the 1000 - 3000 range), i think you should be
> > able to get out a very nice improvement out of the
> > AMAF version. If you don't, i would think that something
> > is wrong somewhere.
> >
> > What test process do you use for this version ?
>
> I tested it mostly doing 2,000 playouts. When AMAF is true I create a
> map of virtual-win values of all the moves played during a playout.
> These values get accumulated over all the playouts (not just the
> first ones). The 'virtual-value' of a move is calculated as follows:
>
> exploration-factor * UCT + ( (nr-wins*2 + nr-virtual-wins) / (nr-
> playouts*2 + nr-virtual-playouts))
>
> where the exploration-factor is currently sqrt(0.2) and UCT is sqrt
> ( log( nr-parent-playouts ) / ( nr-playouts+1) )
>
> Like I said, I haven't had time to experiment much so this formula
> may not be any good. I had also expected to see some positive effect
> of the virtual-win / virtual-playout ratio from AMAF, but I see none.
> Of course it's also possible I have a different kind of bug still.
>
> What happens in my 'formula' is that when it never expands beyond the
> first level, which is what happens if the number of simulations is
> equal to the number of simulations before expansion, the virtual-
> value becomes completely determined by nr-virtual-wins / nr-virtual-
> playouts making it equivalent to the original ref-bot. In case it
> does expand further and creates a tree, the actual win-loss ratio is
> weighed twice as heavily as the virtual win-loss ratio. That seemed
> like a reasonable first try. I have tried a few others, but usually
> didn't get much different results or much worse results.
>
> Mark
>
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/

Souhaitez vous  « être au bureau sans y être » ? Oui je le veux !
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to