Re: [computer-go] Another study based on the reference bot

Michael Williams Mon, 27 Oct 2008 11:47:20 -0700

It's actually more bare-bones (in the sense that there is less code) if you 
consider the update loop that I showed.



Don Dailey wrote:

Great information.   I'll include a version of this to my scalability

study. Is this the C version?

wins[] and hits[] are integer arrays and weight is a fraction less than
1.0, so I'm not sure how this works.  Did you change hits and wins to be
doubles?

There are many enhancements possible, and I'm not going to change the
definition,  it was my intention to provide a very basic "bare-bones"
reference implementation for others to build on.

- Don



On Mon, 2008-10-27 at 13:01 -0400, Michael Williams wrote:

The following modification to AMAF seems to perform better and scale better. The idea is to weight the moves at the beginning of the playout heavier than themoves at the end of the playout. It's probably not a new idea.


This code from the reference implementation:
        wins[mv] += sc;
        hits[mv]++;

Becomes this:
        double weight = 1.0 - (double)(i - savctm) / (ctm - savctm);
        wins[mv] += weight * sc;
        hits[mv] += weight;

If you are not familiar with the reference code, here are the meanings of the 
variables in the code above:
        i is the loop variable, counting from savctm to ctm
        mv iterates over each move in the playout
        sc is 1 or -1, depending on the outcome of the playout
        ctm is the move count at the end of the playout
        savctm is the move count at the beginning of the playout
        hits is the number of times a given move was played
        wins is the number of times a given move resulted in a playout win

At    15 playouts per move, the modified version wins 54.0% of the time (±3.5%) 
after 200 games.
At    30 playouts per move, the modified version wins 54.0% of the time (±3.5%) 
after 200 games.
At    60 playouts per move, the modified version wins 54.5% of the time (±3.5%) 
after 200 games.
At   125 playouts per move, the modified version wins 53.0% of the time (±3.5%) 
after 200 games.
At   250 playouts per move, the modified version wins 54.0% of the time (±3.5%) 
after 200 games.
At   500 playouts per move, the modified version wins 55.5% of the time (±3.5%) 
after 200 games.
At  1000 playouts per move, the modified version wins 57.5% of the time (±3.5%) 
after 200 games.
At  2000 playouts per move, the modified version wins 63.0% of the time (±3.4%) 
after 200 games.
At  4000 playouts per move, the modified version wins 63.5% of the time (±3.4%) 
after 200 games.
At  8000 playouts per move, the modified version wins 63.5% of the time (±3.4%) 
after 200 games.
At 16000 playouts per move, the modified version wins 71.0% of the time (±3.2%) 
after 200 games.

Because of the weighting, it is probably safe to remove the code that checks to see if the move was previously played before awarding credit. Doing so andincrementally calculating the weight would yield this simple and fast update loop after each playout:


         // Track win statistics using weighted AMAF - (All Moves As First)
         // ---------------------------------------------------------------
         double weight = 1.0;
         double weightDelta = 2.0 / (ctm - savctm + 1);
         for (int i = savctm; i < ctm; i += 2)
         {
             int mv = mvs[i] & MASK;

             wins[mv] += weight * sc;
             hits[mv] += weight;
             weight -= weightDelta;
         }

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

------------------------------------------------------------------------

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Another study based on the reference bot

Reply via email to