I've been doing some interesting scalability studies with Lazarus. On the big 19x19 boards, along with the help of others, we tested versions of Lazarus against other versions of Lazarus at different levels. We set up individual versions of Lazarus where the weakest version was Lazarus doing 1024 play-outs and each subsequent version doubled the number of nodes examined. Each version would play the version above it and below it.
The test was eventually discontinued, it was requiring an enormous amount of CPU time and the trend we saw was consistent. It was rare for one version to lose to the version just beneath it. There was some indication that at the higher levels tested the superiority was slightly diminishing, but we cannot say that with a lot of statistical confidence. It would require many months of CPU time to arrive at solid conclusions. I will give more detail in a future report and compile the data for you to see and draw your own conclusions. Meanwhile, I've also been doing some interesting scalability testing against GnuGo 3.7.9. The idea is to see how much difference each doubling in the number of play-outs makes in the result. I am testing with 5x5, 7x7, and 9x9 boards. The version of Lazarus I am using is UCT based with what Dave Hillis refers to as "heavy play-outs", similar to what Mogo does with simple patterns and such. I think you will find the results interesting. I started with 5x5 boards. Since BOTH programs play close to perfection on 5x5 boards, the issue is what KOMI to set. The fairest komi proved to be 24.5 (my program does not deal with non-fractional komi.) With perfect play Black should win every point on the board even with 24.5 komi, but moving up to 25.5 is clearly silly as white would never lose even if he passes every time. So 24.5 is the best I can do. Here is a table showing what happened: Boardsize: 5x5 Komi: 24.5 Opponent: gg-3.7.9 K-nodes Score when Black Score when White Combined score ------- ---------------- ---------------- ---------------- 1024 25/25 =100.00 0/25 = 0.00 25/50 = 50.00 0512 24/25 = 96.00 1/25 = 4.00 25/50 = 50.00 0256 21/25 = 84.00 1/25 = 4.00 22/50 = 44.00 0128 22/25 = 88.00 0/25 = 0.00 22/50 = 44.00 0064 22/25 = 88.00 0/25 = 0.00 22/50 = 44.00 0032 16/25 = 64.00 0/25 = 0.00 16/50 = 32.00 0016 17/25 = 68.00 0/25 = 0.00 17/50 = 34.00 0008 16/25 = 64.00 0/25 = 0.00 16/50 = 32.00 0004 16/25 = 64.00 3/25 = 12.00 19/50 = 38.00 0002 14/25 = 56.00 0/25 = 0.00 14/50 = 28.00 0001 3/25 = 12.00 0/25 = 0.00 3/50 = 6.00 I found it interesting that gnugo occasionally loses with WHITE as does Lazarus at most levels. From the table it can be seen that Lazarus tends to improve with black for each doubling. At 1 million play-outs it appears that Lazarus is playing close to perfect as it never lost a game. Of course only 25 games were played with each color. Clearly, 5x5 is too simple for such a test since there is such a strong bias for black to win. This is one test where it might make sense to try 25.0 komi and score draws. Now let's look at 7x7 boards. The 7x7 and 9x9 games are still in progress but here is the table of what I have so far in 7x7: Boardsize: 7x7 Komi: 8.5 Opponent: gg-3.7.9 K-nodes Score when Black Score when White Combined score ------- ---------------- ---------------- ---------------- 1024 19/19 =100.00 17/20 = 85.00 36/39 = 92.31 0512 24/24 =100.00 17/25 = 68.00 41/49 = 83.67 0256 19/19 =100.00 8/19 = 42.11 27/38 = 71.05 0128 22/22 =100.00 7/22 = 31.82 29/44 = 65.91 0064 21/21 =100.00 5/21 = 23.81 26/42 = 61.90 0032 23/23 =100.00 4/24 = 16.67 27/47 = 57.45 0016 19/21 = 90.48 4/22 = 18.18 23/43 = 53.49 0008 15/24 = 62.50 10/25 = 40.00 25/49 = 51.02 0004 8/23 = 34.78 3/24 = 12.50 11/47 = 23.40 0002 4/17 = 23.53 2/18 = 11.11 6/35 = 17.14 0001 2/15 = 13.33 0/16 = 0.00 2/31 = 6.45 What is interesting about this test, is that the superiority of Lazarus over gg-3.7.9 at this board size is substantially greater than with the 5x5 games when I run an ELO rating analysis. Also, Lazarus NEVER lost a game as BLACK once doing at least 32,000 play-outs. Also, there appears to be a very heavy black bias using komi of 8.5. I don't know the correct komi for these games, but I heard somewhere that the correct komi is 9.0. Like the 5x5 test, since Lazarus doesn't know how to do fractional komi, I am forced to choose which side gets the raw end of the komi deal. However, it's still interesting to note that with the white stones Lazarus can still win 85% of the games against gg-3.7.9. Here is the rating analysis for the 7x7 games. I normalize gg-3.7.9 to be 2000.0 ELO exactly: PLAYER RATING ---------------- ------- 1024 2328.3 0512 2313.8 0256 2246.2 0128 2173.0 0064 2155.1 0032 2098.0 0016 2005.4 gg-3.7.9 2000.0 0008 1910.3 0004 1762.4 0002 1660.5 0001 1377.1 Now for the 9x9 games. My intent is to run more than 50 games for each pairing in this case, but as you see from the table only a few games have been played at the higher levels. Here is the table for the 9x9 games: Boardsize: 9x9 Komi: 7.5 Opponent: gg-3.7.9 K-nodes Score when Black Score when White Combined score ------- ---------------- ---------------- ---------------- 1024 4/4 =100.00 5/5 =100.00 9/9 =100.00 0512 6/7 = 85.71 8/8 =100.00 14/15 = 93.33 0256 6/7 = 85.71 6/7 = 85.71 12/14 = 85.71 0128 22/27 = 81.48 23/27 = 85.19 45/54 = 83.33 0064 18/26 = 69.23 18/26 = 69.23 36/52 = 69.23 0032 15/28 = 53.57 8/29 = 27.59 23/57 = 40.35 0016 10/27 = 37.04 15/28 = 53.57 25/55 = 45.45 0008 10/28 = 35.71 11/28 = 39.29 21/56 = 37.50 0004 3/28 = 10.71 7/29 = 24.14 10/57 = 17.54 0002 2/26 = 7.69 2/26 = 7.69 4/52 = 7.69 0001 0/26 = 0.00 1/26 = 3.85 1/52 = 1.92 Again, you see a great deal of scalability, gg-3.7.9 easily beats Lazarus at the lower levels, but at deeper levels gg-3.7.9 has little chance. The rating chart is not as accurate for the 9x9 games due to the low number of games played, but here it is based on what has been played so far: PLAYER RATING ------------- ------- 1024 2647.6 0512 2444.4 0256 2309.4 0128 2278.3 0064 2140.3 gg-3.7.9 2000.0 0016 1967.9 0032 1931.6 0008 1910.8 0004 1730.4 0002 1573.1 0001 1411.2 It appears that at 9x9 Lazarus needs more play-outs to equalize with gnugo. However, it also appears that at higher levels the superiority is even greater than in the 7x7 games. This is non-intuitive and probably not really the case - I assume this is due to sampling error since fewer games have been played on this boardsize. We also need to see the highest level lose a few games as this surely distorts the table significantly (you cannot get an accurate ELO rating if you don't win AND lose some games.) I may try 11x11 or 13x13 boards at a later time - focusing on lower levels since the longer levels are very time consuming. I will post an update to this in a few days once I've gathered substantially more data. - Don _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/