I've been doing some interesting scalability studies with Lazarus.

On the big 19x19 boards, along with the help of others, we tested
versions of Lazarus against other versions of Lazarus at different
levels.  We set up individual versions of Lazarus where the weakest
version was Lazarus doing 1024 play-outs and each subsequent version
doubled the number of nodes examined.  Each version would play the
version above it and below it.

The test was eventually discontinued, it was requiring an enormous
amount of CPU time and the trend we saw was consistent.  It was rare
for one version to lose to the version just beneath it.  There was
some indication that at the higher levels tested the superiority was
slightly diminishing, but we cannot say that with a lot of statistical
confidence.  It would require many months of CPU time to arrive at
solid conclusions.  I will give more detail in a future report and
compile the data for you to see and draw your own conclusions.

Meanwhile, I've also been doing some interesting scalability testing
against GnuGo 3.7.9.  The idea is to see how much difference each
doubling in the number of play-outs makes in the result.  I am testing
with 5x5, 7x7, and 9x9 boards.  The version of Lazarus I am using is
UCT based with what Dave Hillis refers to as "heavy play-outs",
similar to what Mogo does with simple patterns and such.

I think you will find the results interesting.  I started with 5x5
boards.  Since BOTH programs play close to perfection on 5x5 boards,
the issue is what KOMI to set.  The fairest komi proved to be 24.5 (my
program does not deal with non-fractional komi.)  With perfect play
Black should win every point on the board even with 24.5 komi, but
moving up to 25.5 is clearly silly as white would never lose even if
he passes every time.  So 24.5 is the best I can do.


Here is a table showing what happened:


Boardsize: 5x5
     Komi: 24.5
 Opponent: gg-3.7.9

K-nodes   Score when Black   Score when White    Combined score
-------   ----------------   ----------------   ----------------
1024       25/25  =100.00      0/25  =  0.00     25/50  = 50.00
0512       24/25  = 96.00      1/25  =  4.00     25/50  = 50.00
0256       21/25  = 84.00      1/25  =  4.00     22/50  = 44.00
0128       22/25  = 88.00      0/25  =  0.00     22/50  = 44.00
0064       22/25  = 88.00      0/25  =  0.00     22/50  = 44.00
0032       16/25  = 64.00      0/25  =  0.00     16/50  = 32.00
0016       17/25  = 68.00      0/25  =  0.00     17/50  = 34.00
0008       16/25  = 64.00      0/25  =  0.00     16/50  = 32.00
0004       16/25  = 64.00      3/25  = 12.00     19/50  = 38.00
0002       14/25  = 56.00      0/25  =  0.00     14/50  = 28.00
0001        3/25  = 12.00      0/25  =  0.00      3/50  =  6.00


I found it interesting that gnugo occasionally loses with WHITE as
does Lazarus at most levels.  From the table it can be seen that
Lazarus tends to improve with black for each doubling.  At 1 million
play-outs it appears that Lazarus is playing close to perfect as it
never lost a game.  Of course only 25 games were played with each
color.

Clearly, 5x5 is too simple for such a test since there is such a
strong bias for black to win.  This is one test where it might make
sense to try 25.0 komi and score draws.  

Now let's look at 7x7 boards.  The 7x7 and 9x9 games are still in
progress but here is the table of what I have so far in 7x7:


Boardsize: 7x7
     Komi: 8.5
 Opponent: gg-3.7.9

K-nodes   Score when Black   Score when White    Combined score
-------   ----------------   ----------------   ----------------
1024       19/19  =100.00     17/20  = 85.00     36/39  = 92.31
0512       24/24  =100.00     17/25  = 68.00     41/49  = 83.67
0256       19/19  =100.00      8/19  = 42.11     27/38  = 71.05
0128       22/22  =100.00      7/22  = 31.82     29/44  = 65.91
0064       21/21  =100.00      5/21  = 23.81     26/42  = 61.90
0032       23/23  =100.00      4/24  = 16.67     27/47  = 57.45
0016       19/21  = 90.48      4/22  = 18.18     23/43  = 53.49
0008       15/24  = 62.50     10/25  = 40.00     25/49  = 51.02
0004        8/23  = 34.78      3/24  = 12.50     11/47  = 23.40
0002        4/17  = 23.53      2/18  = 11.11      6/35  = 17.14
0001        2/15  = 13.33      0/16  =  0.00      2/31  =  6.45



What is interesting about this test, is that the superiority of
Lazarus over gg-3.7.9 at this board size is substantially greater than
with the 5x5 games when I run an ELO rating analysis.  Also, Lazarus
NEVER lost a game as BLACK once doing at least 32,000 play-outs.

Also, there appears to be a very heavy black bias using komi of 8.5.
I don't know the correct komi for these games, but I heard somewhere
that the correct komi is 9.0.  Like the 5x5 test, since Lazarus
doesn't know how to do fractional komi, I am forced to choose which
side gets the raw end of the komi deal.

However, it's still interesting to note that with the white stones
Lazarus can still win 85% of the games against gg-3.7.9.

Here is the rating analysis for the 7x7 games.  I normalize 
gg-3.7.9 to be 2000.0 ELO exactly:


PLAYER             RATING
----------------  -------
1024               2328.3
0512               2313.8
0256               2246.2
0128               2173.0
0064               2155.1
0032               2098.0
0016               2005.4
gg-3.7.9           2000.0
0008               1910.3
0004               1762.4
0002               1660.5
0001               1377.1



Now for the 9x9 games.  My intent is to run more than 50 games for
each pairing in this case, but as you see from the table only a few
games have been played at the higher levels.  Here is the table for
the 9x9 games:


Boardsize: 9x9
     Komi: 7.5
 Opponent: gg-3.7.9

K-nodes   Score when Black   Score when White    Combined score
-------   ----------------   ----------------   ----------------
1024        4/4   =100.00      5/5   =100.00      9/9   =100.00
0512        6/7   = 85.71      8/8   =100.00     14/15  = 93.33
0256        6/7   = 85.71      6/7   = 85.71     12/14  = 85.71
0128       22/27  = 81.48     23/27  = 85.19     45/54  = 83.33
0064       18/26  = 69.23     18/26  = 69.23     36/52  = 69.23
0032       15/28  = 53.57      8/29  = 27.59     23/57  = 40.35
0016       10/27  = 37.04     15/28  = 53.57     25/55  = 45.45
0008       10/28  = 35.71     11/28  = 39.29     21/56  = 37.50
0004        3/28  = 10.71      7/29  = 24.14     10/57  = 17.54
0002        2/26  =  7.69      2/26  =  7.69      4/52  =  7.69
0001        0/26  =  0.00      1/26  =  3.85      1/52  =  1.92


Again, you see a great deal of scalability, gg-3.7.9 easily beats
Lazarus at the lower levels, but at deeper levels gg-3.7.9 has little
chance.  The rating chart is not as accurate for the 9x9 games due to
the low number of games played, but here it is based on what has been
played so far:


PLAYER             RATING 
-------------     ------- 
1024               2647.6 
0512               2444.4 
0256               2309.4 
0128               2278.3 
0064               2140.3 
gg-3.7.9           2000.0 
0016               1967.9 
0032               1931.6 
0008               1910.8 
0004               1730.4 
0002               1573.1 
0001               1411.2 


It appears that at 9x9 Lazarus needs more play-outs to equalize with
gnugo.  However, it also appears that at higher levels the superiority
is even greater than in the 7x7 games.  This is non-intuitive and
probably not really the case - I assume this is due to sampling error
since fewer games have been played on this boardsize.  We also need to
see the highest level lose a few games as this surely distorts the
table significantly (you cannot get an accurate ELO rating if you don't
win AND lose some games.)

I may try 11x11 or 13x13 boards at a later time - focusing on lower
levels since the longer levels are very time consuming.  I will post
an update to this in a few days once I've gathered substantially more
data.



- Don


_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to