Hello Don,
A few weeks ago I announced that I was doing a long term
scalability study with computer go on 9x9 boards.
I have constructed a graph of the results so far:
Your work and insight keeps on amazing me.
If I understand is correctly the playouts are made all from the
root position?
I am very interested in the results if the larger amount of
playouts are not made from the root position, but from the
moment the UCT part is over, and the playouts begin,
especially with larger amount of playouts.
I remember some statement of you that it made no
significant difference if the playouts are multiplied from
that position, instead of the root position, even with
hundreds of playouts...
Do those statements still hold?
Edward de Grijs.
From: Don Dailey <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED], computer-go <computer-go@computer-go.org>
To: computer-go <computer-go@computer-go.org>
Subject: [computer-go] The physics of Go playing strength.
Date: Sat, 07 Apr 2007 21:05:19 -0400
A few weeks ago I announced that I was doing a long term
scalability study with computer go on 9x9 boards.
I have constructed a graph of the results so far:
http://greencheeks.homelinux.org:8015/~drd/public/study.jpg
Although I am still collecting data, I feel that I have
enough samples to report some results - although I will
continue to collect samples for a while.
This study is designed to measure the improvement in
strength that can be expected with each doubling of computer
resources.
I'm actually testing 2 programs - both of them UCT style go
programs, but one of those programs does uniformly random
play-outs and the other much stronger one is similar to
Mogo, as documented in one of their papers.
Dave Hillis coined the terminolgoy I will be using, light
play-outs vs heavy play-outs.
For the study I'm using 12 versions of each program. The
weakest version starts with 1024 play-outs in order to
produce a move. The next version doubles this to 2048
play-outs, and so on until the 12th version which does 2
million (2,097,152) playouts. This is a substantial study
which has taken weeks so far to get to this point.
Many of the faster programs have played close to 250 games,
but the highest levels have only played about 80 games so
far.
The scheduling algorithm is very similar to the one used by
CGOS. An attempt is made not to waste a lot of time playing
seriously mis-matched opponents.
The games were rated and the results graphed. You can see
the result of the graph here (which I also included near the
top of this message):
http://greencheeks.homelinux.org:8015/~drd/public/study.jpg
The x-axis is the number of doublings starting with 1024
play-outs and the y-axis is the ELO rating.
The public domain program GnuGo version 3.7.9 was assigned
the rating 2000 as a reference point. On CGOS, this program
has acheived 1801, so in CGOS terms all the ratings are
about 200 points optimistic.
Feel free to interpret the data any way you please, but here
are my own observations:
1. Scalability is almost linear with each doubling.
2. But there appears to be a very gradual fall-off with
time - which is what one would expect (ELO
improvements cannot be infinite so they must be
approaching some limit.)
3. The heavy-playout version scales at least as well,
if not better, than the light play-out version.
(You can see the rating gap between them gradually
increase with the number of play-outs.)
4. The curve is still steep at 2 million play-outs, this
is convincing empirical evidence that there are a few
hundred ELO points worth of improvement possible
beyond this.
5. GnuGo 3.7.9 is not competive with the higher levels of
Lazarus. However, what the study doesn't show is that
Lazarus needs 2X more thinking time to play equal to
GnuGo 3.7.9.
This graph explains why I feel that absolute playing
strength is a poor conceptual model of how humans or
computers play go. If Lazarus was running on the old Z-80
processors of a few decades ago, it would be veiewed as an
incredibly weak program, but running on a supercomputer it's
a very strong program. But in either case it's the SAME
program. The difference is NOT the amount of work each
system is capable of, it's just that one takes longer to
accomplish a given amount of work. It's much like the
relationships between power, work, force, time etc. in
physics.
Based on this type of analysis and the physics analogy,
GnuGo 3.7.9 is a stronger program than Lazarus (even at 9x9
go). Lazarus requires about 2X more time to equalize. So
Lazarus plays with less "force" (if you use the physics
analogy) and needs more TIME to get the same amount of work
done.
ELO is treated numerically as if it were "work" in physics
because when it's measured by playing games, both players
get the same amount of time. The time factor cancels out
but it cause us to ignore that it's part of the equation.
On CGOS, Lazarus and FatMan are the same program, but one
does much more work and they have ELO ratings that differ by
almost 300 ELO points. Even though they are the same
program you will look on CGOS and believe Lazarus is much
stronger because you have not considered the physics of Go
playing strength.
- Don
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/
_________________________________________________________________
Wil jij MSN reporter zijn? Deel jouw nieuws en verhalen hier!
http://reporter.msn.nl/
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/