Re: [computer-go] UCT memory issues and scalability

Łukasz Lew Tue, 16 Jan 2007 11:24:31 -0800

There is a good argument why 100 is ok.
When You have about 50 children, then waiting 100 playouts before
start of attaching them results only in 2 playouts per child loss, so
I guess even higher threshold should be OK.


Lukasz

On 1/16/07, Don Dailey <[EMAIL PROTECTED]> wrote:

I have been doing a lot of experiments with the scalability and
memory usage of UCT.    I'm using the exact scheme that was
described like this in a previous posting by someone:

 Here is a summary of how it works:
  - Use probability of winning as score, not territory
  - Use the average outcome as position value
  - Select the move that maximizes v + sqrt((2*log(t))/(10*n))

  v is the value of the move (average outcome, between 0 and 1), n the
  number of simulations of this move, and t the total number of
  simulations at the current position. In case a move has n = 0, it is
  selected first.


For these experiments, I use random play-outs.   This version should
be a relatively generic implementation of what many of us are doing,
without the frills and a relatively stable bug-free implementation.

Each version does 2x the number of play-outs as previous version
and the level of the program is defined as the number of play-outs
* 1024 - so level 2 does 2048 playouts, etc.   I am going to test
as far as I can stand it like this:

   level 1
   level 2
   level 4
   level 8
   level 16
   ....

Additionally,  I am testing 2 variants, because I am interesting in
saving memory.   The base version does the basic thing - always
expanding a node in the tree once it has been visited one time,
or in other words each probe is guaranteed to add 1 new node to
the tree.

The second variant will not expand a node until it has been
visited 5 times - expanding on the 6th visit.   This version is
significantly more stingy about memory use.

So far, after running well over 1000 games I cannot detect an
advantage over using the less memory hungry version.   There
seems to be no point in wasting the memory.

In a separate less systematic test I did last week,  I also
tested values as high as 100 visits before expansion at levels
comparable or higher to CGOS time controls.   In that test,
I tested 1 visit, 5 visits, 10 visits and 100 visits before
expansion.

After playing a massive round robin (where each player played
a total of over 200 games)  I could not prove empirically that
even 100 visits is too much.  The ELO rating difference varied
only 40 ELO points from high to low but the 1 visit version,
which is supposed to be the best,  was the second lowest rated
version and the 100 visit version score 10 ELO points higher.
This is all statistical noise.
In this implementation the 100 visit version was substantially
faster which makes it a no-brainer - however I replaced my
node management routines with a fixed pool of nodes so there
is no malloc or free operations.   Now the slowdown is very
minor due to excessive node expansions.

Does anyone have any reason to believe we should be expanding
so aggressively as after only 1 visit?   My tests indicate
that you can easily get away with anything less than 100.

Here are the results of the current more systematic test.  I
will continue to expand this by introducing higher levels
to the autotester and dropping the lower ones in order to
perform a scalability study.    We are seeing about 200 ELO
per doubling at these levels.

s05_032 means:  5 visits before expand - level 32 which
                is 32K play-outs per move.

 Perf
 Rating  Win perc  Tot Gms  Ave Time  Player
-------  --------  -------  --------  -------
 2107.1    90.749      227      50.3  s05_032
 2102.7    90.323      217      52.2  s01_032
 1942.1    79.452      219      32.1  s05_016
 1916.3    78.027      223      32.6  s01_016
 1764.3    65.447      246      16.1  s01_008
 1753.1    61.200      250      15.3  s05_008
 1542.1    40.000      245       6.8  s01_004
 1526.5    39.914      233       6.3  s05_004
 1284.3    22.273      220       1.8  s05_002
 1274.8    22.594      239       1.8  s01_002
 1000.0     7.623      223       0.2  s05_001
  972.4     5.932      236       0.3  s01_001

Black wins:      652      46.9 %
White wins:      737      53.1 %



- Don







_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] UCT memory issues and scalability

Reply via email to