Re: [computer-go] The physics of Go playing strength.
Le dimanche 8 avril 2007 03:05, Don Dailey a écrit : > A few weeks ago I announced that I was doing a long term > scalability study with computer go on 9x9 boards. > > I have constructed a graph of the results so far: > > http://greencheeks.homelinux.org:8015/~drd/public/study.jpg Thanks for this interesting study. [snip] > Feel free to interpret the data any way you please, but here > are my own observations: > > 1. Scalability is almost linear with each doubling. > > 2. But there appears to be a very gradual fall-off with > time - which is what one would expect (ELO > improvements cannot be infinite so they must be > approaching some limit.) Could'nt the inflexion of heavy curve also mean that the advantage of heavy play-out disappears when the number of simulation is very high ? With huge number of simulation the heavy player could become weaker than the light player, due to the "wrong knowledge" introduced in the play-out. Sadly it seems hard to test this (12-13-14) without huge computing power, a distributed [EMAIL PROTECTED], or a big amount of patience :-) > > 3. The heavy-playout version scales at least as well, > if not better, than the light play-out version. > > (You can see the rating gap between them gradually > increase with the number of play-outs.) between 10 and 11 the trend changes. Regards. Alain ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] LISP question (littlle bit off topic)
Paper 1 in the list below states: Numbers were originally implemented in Lisp I as a list of atoms. and the Lisp 1.5 manual states: Arithmetic in Lisp 1.5 is new Could you give an example how the number 3 was implemented in Lisp-1 and how 2+1? So far I have found only this remarks but not programming examples. It would be much more instructive for my article if I could quote these examples. Chrilly - Original Message - From: "Ron Goldman" <[EMAIL PROTECTED]> To: "computer-go" Cc: "Chrilly" <[EMAIL PROTECTED]> Sent: Sunday, April 08, 2007 2:23 AM Subject: Re: [computer-go] LISP question (littlle bit off topic) Crilly, I used to program in LISP and had never heard of this, so I did some checking. I think this is a misconception from the fact that numbers were considered atoms and hence stored on the list of atoms. Instead of just being a numeric value they consisted of an association list (e.g. a list of atoms) containing a tag to indicate the value was a number and another word with the value. The LISP I Programmers Manual [1] gives an example: -1 => (MINUS . (ASSOC NUMB FLO (1.0))) (In fact LISP I (1960) only supported floating-point numbers, LISP 1.5 (1961) supported both integers & floats. [2]) As a result of storing values in an association list arithmetic routines had to do several memory references to obtain the numeric value. In a paper on the "History of Lisp" John McCarthy [3] discussed this writing that "Numbers were originally implemented in LISP I as lists of atoms, and this proved too slow for all but the simplest computations. A reasonably efficient implementation of numbers as atoms in S-expressions as made in LISP 1.5, but in all the early LISPs, numerical computations were still 10 to 100 times slower than in FORTRAN." Later versions of LISP [4] used better tagging schemes for numbers and were able to produce compiled code that was as fast (or faster) then C or FORTRAN. Finally LISP early on had bignums to compute using arbitrary- precision integers (similar to Java's BigInteger). Useful if you needed to compute factorial of 1000 exactly. -- Ron -- 1. http://community.computerhistory.org/scc/projects/LISP/book/LISP% 20I%20Programmers%20Manual.pdf 2. ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-024.pdf 3. http://www-formal.stanford.edu/jmc/history/lisp.ps 4. http://doi.acm.org/10.1145/1086803.1086804 On Apr 7, 2007, at 12:54 PM, Chrilly wrote: Up to my knowledge the first Lisp Versions had no number system. The number n was represented as the list of numbers from 1 to n (which is also the mathematical/axiomatic definition of the natural numbers). But its not very practical. Can anyone provide me with a link how this was done. I am speaking some computer languages, but Lisp is not among them. I want to present the code in an article for the Austrian AI- Journal (as an example that mathematical elegance and practically usefull are 2 different things). Chrilly ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] The physics of Go playing strength.
Thanks dons for producing these fascinating results. I hope that when you have finished the study, you will show us not just this graph, but also the game results (number of wins) that it is derived from. At 02:05 08/04/2007, you wrote: A few weeks ago I announced that I was doing a long term scalability study with computer go on 9x9 boards. I have constructed a graph of the results so far: http://greencheeks.homelinux.org:8015/~drd/public/study.jpg ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] The physics of Go playing strength.
According these results the slope is considerable greater than in chess. In the classical experiment of Ken Thompons searching 1 ply deeper is worth about 200 Elo. 1 ply corresponds to 5-6 times longer/faster. In 9x9 already a factor of 2 gives the same improvement. This is really remarkable. Another explanation would be, that 100 Elo have in Go a different meaning than in chess. It is often argued that the distance between week and stronger player is much greater in Go than in Chess. In chess the distance between an average club player and top humans is about 1000 Elo. Maybe in Go its 2000 Elo?? In chess the green level-11 version would have world-champion level. Is it just enough to make a 2 million playouts version to beat the top-Dans in 9x9? Is it that easy? Just build a special purpose chip like ChipTest aka Deep Blue. Or implement it on a cluster. Or just wait a few years on do it on the PC. Or a playstation. Chrilly Is there any notion of the Elo rating of a professional Go player. In chess terms the - Original Message - From: "Don Dailey" <[EMAIL PROTECTED]> To: "computer-go" Sent: Sunday, April 08, 2007 3:05 AM Subject: [computer-go] The physics of Go playing strength. A few weeks ago I announced that I was doing a long term scalability study with computer go on 9x9 boards. I have constructed a graph of the results so far: http://greencheeks.homelinux.org:8015/~drd/public/study.jpg Although I am still collecting data, I feel that I have enough samples to report some results - although I will continue to collect samples for a while. This study is designed to measure the improvement in strength that can be expected with each doubling of computer resources. I'm actually testing 2 programs - both of them UCT style go programs, but one of those programs does uniformly random play-outs and the other much stronger one is similar to Mogo, as documented in one of their papers. Dave Hillis coined the terminolgoy I will be using, light play-outs vs heavy play-outs. For the study I'm using 12 versions of each program. The weakest version starts with 1024 play-outs in order to produce a move. The next version doubles this to 2048 play-outs, and so on until the 12th version which does 2 million (2,097,152) playouts. This is a substantial study which has taken weeks so far to get to this point. Many of the faster programs have played close to 250 games, but the highest levels have only played about 80 games so far. The scheduling algorithm is very similar to the one used by CGOS. An attempt is made not to waste a lot of time playing seriously mis-matched opponents. The games were rated and the results graphed. You can see the result of the graph here (which I also included near the top of this message): http://greencheeks.homelinux.org:8015/~drd/public/study.jpg The x-axis is the number of doublings starting with 1024 play-outs and the y-axis is the ELO rating. The public domain program GnuGo version 3.7.9 was assigned the rating 2000 as a reference point. On CGOS, this program has acheived 1801, so in CGOS terms all the ratings are about 200 points optimistic. Feel free to interpret the data any way you please, but here are my own observations: 1. Scalability is almost linear with each doubling. 2. But there appears to be a very gradual fall-off with time - which is what one would expect (ELO improvements cannot be infinite so they must be approaching some limit.) 3. The heavy-playout version scales at least as well, if not better, than the light play-out version. (You can see the rating gap between them gradually increase with the number of play-outs.) 4. The curve is still steep at 2 million play-outs, this is convincing empirical evidence that there are a few hundred ELO points worth of improvement possible beyond this. 5. GnuGo 3.7.9 is not competive with the higher levels of Lazarus. However, what the study doesn't show is that Lazarus needs 2X more thinking time to play equal to GnuGo 3.7.9. This graph explains why I feel that absolute playing strength is a poor conceptual model of how humans or computers play go. If Lazarus was running on the old Z-80 processors of a few decades ago, it would be veiewed as an incredibly weak program, but running on a supercomputer it's a very strong program. But in either case it's the SAME program. The difference is NOT the amount of work each system is capable of, it's just that one takes longer to accomplish a given amount of work. It's much like the relationships between power, work, force, time etc. in physics. Based on this type of analysis and the physics analogy, GnuGo 3.7.9 is a stronger program than Lazarus (even at 9x9 go). Lazarus requires about 2X more time to equalize. So Lazarus plays with less "force" (if you use the physics analogy) and needs more TIME to get the same amount of work done. ELO is treated numericall
Re: [computer-go] The physics of Go playing strength.
The discussion here http://senseis.xmp.net/?EloRating suggests that the difference between beginners and top players in go is about 3000 ELO on a 19x19 board. This difference is very dependent on the board size. I can think of a naive argument that this difference should scale linearly with the (linear) size of the board, that is as the square-root of the area of the board. At 08:56 08/04/2007, you wrote: According these results the slope is considerable greater than in chess. In the classical experiment of Ken Thompons searching 1 ply deeper is worth about 200 Elo. 1 ply corresponds to 5-6 times longer/faster. In 9x9 already a factor of 2 gives the same improvement. This is really remarkable. Another explanation would be, that 100 Elo have in Go a different meaning than in chess. It is often argued that the distance between week and stronger player is much greater in Go than in Chess. In chess the distance between an average club player and top humans is about 1000 Elo. Maybe in Go its 2000 Elo?? In chess the green level-11 version would have world-champion level. Is it just enough to make a 2 million playouts version to beat the top-Dans in 9x9? Is it that easy? Just build a special purpose chip like ChipTest aka Deep Blue. Or implement it on a cluster. Or just wait a few years on do it on the PC. Or a playstation. Chrilly Is there any notion of the Elo rating of a professional Go player. In chess terms the - Original Message - From: "Don Dailey" <[EMAIL PROTECTED]> To: "computer-go" Sent: Sunday, April 08, 2007 3:05 AM Subject: [computer-go] The physics of Go playing strength. A few weeks ago I announced that I was doing a long term scalability study with computer go on 9x9 boards. I have constructed a graph of the results so far: http://greencheeks.homelinux.org:8015/~drd/public/study.jpg Although I am still collecting data, I feel that I have enough samples to report some results - although I will continue to collect samples for a while. This study is designed to measure the improvement in strength that can be expected with each doubling of computer resources. I'm actually testing 2 programs - both of them UCT style go programs, but one of those programs does uniformly random play-outs and the other much stronger one is similar to Mogo, as documented in one of their papers. Dave Hillis coined the terminolgoy I will be using, light play-outs vs heavy play-outs. For the study I'm using 12 versions of each program. The weakest version starts with 1024 play-outs in order to produce a move. The next version doubles this to 2048 play-outs, and so on until the 12th version which does 2 million (2,097,152) playouts. This is a substantial study which has taken weeks so far to get to this point. Many of the faster programs have played close to 250 games, but the highest levels have only played about 80 games so far. The scheduling algorithm is very similar to the one used by CGOS. An attempt is made not to waste a lot of time playing seriously mis-matched opponents. The games were rated and the results graphed. You can see the result of the graph here (which I also included near the top of this message): http://greencheeks.homelinux.org:8015/~drd/public/study.jpg The x-axis is the number of doublings starting with 1024 play-outs and the y-axis is the ELO rating. The public domain program GnuGo version 3.7.9 was assigned the rating 2000 as a reference point. On CGOS, this program has acheived 1801, so in CGOS terms all the ratings are about 200 points optimistic. Feel free to interpret the data any way you please, but here are my own observations: 1. Scalability is almost linear with each doubling. 2. But there appears to be a very gradual fall-off with time - which is what one would expect (ELO improvements cannot be infinite so they must be approaching some limit.) 3. The heavy-playout version scales at least as well, if not better, than the light play-out version. (You can see the rating gap between them gradually increase with the number of play-outs.) 4. The curve is still steep at 2 million play-outs, this is convincing empirical evidence that there are a few hundred ELO points worth of improvement possible beyond this. 5. GnuGo 3.7.9 is not competive with the higher levels of Lazarus. However, what the study doesn't show is that Lazarus needs 2X more thinking time to play equal to GnuGo 3.7.9. This graph explains why I feel that absolute playing strength is a poor conceptual model of how humans or computers play go. If Lazarus was running on the old Z-80 processors of a few decades ago, it would be veiewed as an incredibly weak program, but running on a supercomputer it's a very strong program. But in either case it's the SAME program. The difference is NOT the amount of work each system is capable of, it's just that one takes longer to accomplish a given amount
Re: [computer-go] MoGo
On Sat, Apr 07, 2007 at 04:03:47PM -0400, Don Dailey wrote: > > On Sat, 2007-04-07 at 14:36 -0400, Matt Kingston wrote: > > What I want from a commercial go playing program is one that I can use > > to learn to be a better go player. [...] > > I have heard this argument before, but I personally do not > subscribe to it. Please let me explain why. Here I must side with Matt Kingston. If I would be learning Go only from playing a computer program, I would like to learn a bit of the endgame techniques, even if I am loosing anyway. That would be impossible with a program that aims to win by precisely 0.5 points. Likewise with the middle game. As long as the computer wins the game in the opening, it can play what ever strange moves in the middle game, and I will have a hard time learning anything from it. > Of what learning purpose is it if you are losing the game and > the computer gets you to focus on a dramatic battle somewhere > that is totally irelevant? Maybe not a 'dramatic battle', but good habits like playing sente yose before gote yose, and not playing a first-line hane if you don't come back and connect it. > In fact this is how beginners think about the game. It doesn't > seem to me like a good learning aid to try to get the computers > to "emulate" the losing strategy weaker players use. Weaker players can not estimate the score until very late in the game. Not with enough precision, anyway. Thus, most of the time they have no idea if they are winning or loosing by 0.5 points. Then the most obvious strategy must be to maximize your score, so that even in case of an error in the evaluation or an error in the endgame, the result would still be favourable. This ought to apply to computer programs too, as long as we have much uncertainty in the evaluation functions. - Heikki -- Heikki Levanto "In Murphy We Turst" heikki (at) lsd (dot) dk ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] The physics of Go playing strength.
On Sun, 2007-04-08 at 00:44 -0400, [EMAIL PROTECTED] wrote: > I question whether it's valid to make this kind of comparison when > Gnugo scales so differently from UCT. If you froze one of the UCT > prgrams at 1 million playouts/move and then tried to scale gnugo until > it matched that level of strength, that might not even be possible. I > think you need to compare 2 curves (as you're doing with the lite and > heavy UCT programs), not one curve and one fixed point. I'm not really comparing the scalability of GnuGo but I'm treating it as a fixed point. I'm also trying to "abstract" the concept of strength/time as a unified concept whether a program scales or not. In other words, the strength of a program is almost meaningless without a consideration of time. When people compare 2 programs and say this one is stronger than that one, there is this unspoken assuption that they are running on similar hardware with similar time being used. (And this is how it should be, but they are almost oblivious of the time factor.) So I think it's more correct to say Gnugo is stronger at 9x9 GO even though Lazarus beats it on CGOS.No one would question this if I were running Lazarus on a 486, they would just say GnuGo is stronger. Of course I agree that GnuGo has different scalability properties so it's a slippery concept. I don't know if GnuGo would win if I cranked up the level on it. But at least at fast time controls, GnuGo has a more efficient underlying engine and "in principle" it might scale better. I'm not saying we should throw out our concept of ELO strength - but I am saying that it's not 1 dimensional. In other words, you have to give 2 parameters e.g. "Gnugo is stronger than Lazarus playing at 1 minute per move on a pentium 2.4", etc. If a program is known to be "fixed" without the ability to adjust parameters or levels, it is sufficient to speak as if it's an absolute thing without any regard to time, because the time factor is implicit and understood. Therefore, you will occasionally hear statements like, "program X is pretty strong, but it takes quite a bit of time for it to play a move." - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] The physics of Go playing strength.
On Sun, 2007-04-08 at 09:43 +0200, alain Baeckeroot wrote: > Could'nt the inflexion of heavy curve also mean that the advantage of > heavy play-out disappears when the number of simulation is very high ? > With huge number of simulation the heavy player could become weaker > than > the light player, due to the "wrong knowledge" introduced in the > play-out. > Sadly it seems hard to test this (12-13-14) without huge computing > power, > a distributed [EMAIL PROTECTED], or a big amount of patience :-) Eventually, the curve would have to close back up as both versions approach perfect play. So even if the gap widens for a while, this cannot stay forever because in the end they will touch. - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] The physics of Go playing strength.
On Sun, 2007-04-08 at 09:56 +0200, Chrilly wrote: > Is it just enough to make a 2 million playouts version > to beat the top-Dans in 9x9? Is it that easy? Of course the ELO numbers are arbitrary. I assigned GnuGo 3.7.9 a level of 2000 but on CGOS it is 1800.But CGOS numbers are arbitrary too. If you can estimate the ELO rating of GnuGo 3.7.9 compared to say a 1 Dan player, then you can estimate the winning percentages of any version against a human of a given strength. Mogo of course is about 200-300 higher at the same levels. So I believe that if special purpose hardware could bump MoGo up a few times faster, you would really would have a professional strength 9x9 player. It might be easier to do this with a lite play-out version if you can get substantially more speed, say 4X faster due to the much simpler move logic. I don't know hardware, but it could be easier (more opportunities for parallelism to do it with a heavy version.) - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] The physics of Go playing strength.
On Sun, 2007-04-08 at 09:36 +0100, Tom Cooper wrote: > Thanks dons for producing these fascinating results. I hope that > when you have finished the study, you will show us not just this > graph, but also the game results (number of wins) that it is > derived from. I have all games and all data if anyone want them. I would probably build a crosstable of results for each pairing in a final report on a web page. - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] The physics of Go playing strength.
On Sun, Apr 08, 2007 at 08:32:48AM -0400, Don Dailey wrote: > Eventually, the curve would have to close back up as both versions > approach perfect play. So even if the gap widens for a while, this > cannot stay forever because in the end they will touch. Aren't you being a bit optimistic here? It is quite conceivable that the curves will flatten out and reach a maximum level somewhat below perfect play. I don't see how we can predict the difference between them at that time. - Heikki -- Heikki Levanto "In Murphy We Turst" heikki (at) lsd (dot) dk ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MoGo
On Sun, 2007-04-08 at 11:24 +0200, Heikki Levanto wrote: > > In fact this is how beginners think about the game. It doesn't > > seem to me like a good learning aid to try to get the computers > > to "emulate" the losing strategy weaker players use. > > Weaker players can not estimate the score until very late in the game. > Not with enough precision, anyway. Thus, most of the time they have no > idea if they are winning or loosing by 0.5 points. But the whole idea is to take you PAST this level of understanding. > Then the most obvious > strategy must be to maximize your score, so that even in case of an > error in the evaluation or an error in the endgame, the result would > still be favourable. Again, this is probably a good strategy for beginners, but the idea is to get you beyond this point. That's why we are talking about a program that you can LEARN from. LEARN means get better. So perhaps it is the case that a dumbed down version is better initially, but may not help you get past this conceptual barrier. > This ought to apply to computer programs too, as > long as we have much uncertainty in the evaluation functions. But it doesn't. I have not seen where maximizing the total won territory has improved a program. It's more important to actually understand what is going on - map out specifically what you need to win and not worrry about anything else. You should fight for the win and this is not a difficult concept - this will make you much better. - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] The physics of Go playing strength.
On Sun, 2007-04-08 at 14:44 +0200, Heikki Levanto wrote: > Aren't you being a bit optimistic here? It is quite conceivable that > the > curves will flatten out and reach a maximum level somewhat below > perfect > play. I don't see how we can predict the difference between them at > that > time. UCT has been proven to converge to optimum play. Read some of the papers. - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MoGo
On Sun, Apr 08, 2007 at 08:48:03AM -0400, Don Dailey wrote: > > On Sun, 2007-04-08 at 11:24 +0200, Heikki Levanto wrote: > > Weaker players can not estimate the score until very late in the game. > > Not with enough precision, anyway. Thus, most of the time they have no > > idea if they are winning or loosing by 0.5 points. > > But the whole idea is to take you PAST this level of understanding. I bet most of the go-playing population (humans and computer programs alike) are quite incapable of estimating the final score on a 19x19 board except in the yose. Perhaps dan-level amateurs can do it in the late middle game. But show me the player who can say at move 30 that playing here will lead to 0.5 point victory, and playing there will loose the game by a point and a half! Or show me any professional who claims to know the final score before the tenth move! Such people don't need to play go, they have already solved the game! - Heikki -- Heikki Levanto "In Murphy We Turst" heikki (at) lsd (dot) dk ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] The physics of Go playing strength.
The question here is not about UCT(yes, it gaves the same rusults as alpha-beta). It's about MC scoring. It has not been proved that MC score will generate the optimum play with large enough simulation. Now the best super computer uses about 500,000 CPUs, which is 2 to the 18th power of computation increase. Don's curve can be tested to the number 18 now. -Original Message- From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: computer-go@computer-go.org Sent: Sun, 8 Apr 2007 7:49 AM Subject: Re: [computer-go] The physics of Go playing strength. On Sun, 2007-04-08 at 14:44 +0200, Heikki Levanto wrote: > Aren't you being a bit optimistic here? It is quite conceivable that > the > curves will flatten out and reach a maximum level somewhat below > perfect > play. I don't see how we can predict the difference between them at > that > time. UCT has been proven to converge to optimum play. Read some of the papers. - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ AOL now offers free email to everyone. Find out more about what's free from AOL at AOL.com. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] The success of UCT
The factorial of 81 is about 10^140. The number of legal positions may be some roots of that. It's still a huge number. The number of simulations used in UCT grograms is several hundred thousand, which is tiny compared to the total possible number of plays. How can they still be so successful? I think the only explanation is that the evaluation values (at least for MC) is highly structured. The question needed to answer is that is the scale of this structure proportional to nxn? (n is the board length.) If it is, it's very hopeful for computer Go in 19x19 using UCT-MC approach. Daniel Liu AOL now offers free email to everyone. Find out more about what's free from AOL at AOL.com. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] The success of UCT
On 4/8/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: The factorial of 81 is about 10^140. The number of legal positions may be it may be 103919148791293834318983090438798793469 regards, -John ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] The physics of Go playing strength.
Hello Don, A few weeks ago I announced that I was doing a long term scalability study with computer go on 9x9 boards. I have constructed a graph of the results so far: Your work and insight keeps on amazing me. If I understand is correctly the playouts are made all from the root position? I am very interested in the results if the larger amount of playouts are not made from the root position, but from the moment the UCT part is over, and the playouts begin, especially with larger amount of playouts. I remember some statement of you that it made no significant difference if the playouts are multiplied from that position, instead of the root position, even with hundreds of playouts... Do those statements still hold? Edward de Grijs. From: Don Dailey <[EMAIL PROTECTED]> Reply-To: [EMAIL PROTECTED], computer-go To: computer-go Subject: [computer-go] The physics of Go playing strength. Date: Sat, 07 Apr 2007 21:05:19 -0400 A few weeks ago I announced that I was doing a long term scalability study with computer go on 9x9 boards. I have constructed a graph of the results so far: http://greencheeks.homelinux.org:8015/~drd/public/study.jpg Although I am still collecting data, I feel that I have enough samples to report some results - although I will continue to collect samples for a while. This study is designed to measure the improvement in strength that can be expected with each doubling of computer resources. I'm actually testing 2 programs - both of them UCT style go programs, but one of those programs does uniformly random play-outs and the other much stronger one is similar to Mogo, as documented in one of their papers. Dave Hillis coined the terminolgoy I will be using, light play-outs vs heavy play-outs. For the study I'm using 12 versions of each program. The weakest version starts with 1024 play-outs in order to produce a move. The next version doubles this to 2048 play-outs, and so on until the 12th version which does 2 million (2,097,152) playouts. This is a substantial study which has taken weeks so far to get to this point. Many of the faster programs have played close to 250 games, but the highest levels have only played about 80 games so far. The scheduling algorithm is very similar to the one used by CGOS. An attempt is made not to waste a lot of time playing seriously mis-matched opponents. The games were rated and the results graphed. You can see the result of the graph here (which I also included near the top of this message): http://greencheeks.homelinux.org:8015/~drd/public/study.jpg The x-axis is the number of doublings starting with 1024 play-outs and the y-axis is the ELO rating. The public domain program GnuGo version 3.7.9 was assigned the rating 2000 as a reference point. On CGOS, this program has acheived 1801, so in CGOS terms all the ratings are about 200 points optimistic. Feel free to interpret the data any way you please, but here are my own observations: 1. Scalability is almost linear with each doubling. 2. But there appears to be a very gradual fall-off with time - which is what one would expect (ELO improvements cannot be infinite so they must be approaching some limit.) 3. The heavy-playout version scales at least as well, if not better, than the light play-out version. (You can see the rating gap between them gradually increase with the number of play-outs.) 4. The curve is still steep at 2 million play-outs, this is convincing empirical evidence that there are a few hundred ELO points worth of improvement possible beyond this. 5. GnuGo 3.7.9 is not competive with the higher levels of Lazarus. However, what the study doesn't show is that Lazarus needs 2X more thinking time to play equal to GnuGo 3.7.9. This graph explains why I feel that absolute playing strength is a poor conceptual model of how humans or computers play go. If Lazarus was running on the old Z-80 processors of a few decades ago, it would be veiewed as an incredibly weak program, but running on a supercomputer it's a very strong program. But in either case it's the SAME program. The difference is NOT the amount of work each system is capable of, it's just that one takes longer to accomplish a given amount of work. It's much like the relationships between power, work, force, time etc. in physics. Based on this type of analysis and the physics analogy, GnuGo 3.7.9 is a stronger program than Lazarus (even at 9x9 go). Lazarus requires about 2X more time to equalize. So Lazarus plays with less "force" (if you use the physics analogy) and needs more TIME to get the same amount of work done. ELO is treated numerically as if it were "work" in physics because when it's measured by playing games, both players get the same amount of time. The time factor cancels out but it cause us to ignore that it's part of the equation. On CGOS, Lazarus and FatMan are the same program, but
Re: [computer-go] MoGo
On 4/7/07, Don Dailey <[EMAIL PROTECTED]> wrote: Of what learning purpose is it if you are losing the game and the computer gets you to focus on a dramatic battle somewhere that is totally irelevant? I think the issue is that once the computer has a won position, all moves seem almost equally irrelevant, so the computer plays randomly. Ideally the other player would also understand the position and resign, but if the other player doesn't understand, then ideally the computer should clarify by playing moves that demonstrate the strength of the position. That doesn't mean playing randomly or getting into irrelevant battles (unless the other player needs to play something out to see it). It does means showing the strength of the stones already on the board in a way that the other player will understand, in a minimum number of moves. But that's hard because clarity isn't something that's just a consequence of the game rules; it also requires some sort of knowledge of how humans understand Go. It seems like something to take on after the main challenge is solved. Perhaps using Japanese rules would be a step in this direction since they discourage play whose irrelevance is more apparent to humans, making it more likely that the computer will make a clarifying move. Although I suppose that, if you know a little about how a computer opponent works, random play can be taken as an indication that it thinks the game is over. Perhaps it's a matter of humans understanding better what the computer is saying. Although, recognizing that the computer thinks the game is over and understanding why are two different things. And go really is all about that - knowing what is really important and what is not and I would rather learn from a player who demonstrates that to me. Well, yes. - Brian ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] The success of UCT
You don't have to see to the end of the game to play well. They have done studies in computer chess which show the number of times a deeper search changes it's mind. It becomes smaller and smaller with depth - presumably because most of the moves are already optimial. Also, it is clear (in chess) that computers play the VERY BEST move most of the time, as good humans do. At the upper levels of skill, the difference between a good and great player is the occasional move. In my UCT study, you will notice that the highest level is signficantly stronger than the levels below it. However, MOST of the time, it plays the same move it would have selected after just 1000 play-outs.In other words, most of it's superiority is centered around a few key moves. The 2 million node version RARELY plays a different move than the 1 million node version, but it is perhaps just a couple of time per game on average. And yet that is enough to demonstrate signficant superiority. In most chess games between grandmasters, you can often identify ONE move that made the difference between winning and losing. - Don On Sun, 2007-04-08 at 11:46 -0400, [EMAIL PROTECTED] wrote: > The factorial of 81 is about 10^140. The number of legal positions may > be some roots of that. It's still a huge number. The number of > simulations used in UCT grograms is several hundred thousand, which is > tiny compared to the total possible number of plays. How can they > still be so successful? I think the only explanation is that the > evaluation values (at least for MC) is highly structured. The question > needed to answer is that is the scale of this structure proportional > to nxn? (n is the board length.) If it is, it's very hopeful for > computer Go in 19x19 using UCT-MC approach. > > > Daniel Liu > > __ > AOL now offers free email to everyone. Find out more about what's free > from AOL at AOL.com. > > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] The physics of Go playing strength.
On Sun, 2007-04-08 at 18:11 +0200, Edward de Grijs wrote: > Hello Don, > > >A few weeks ago I announced that I was doing a long term > >scalability study with computer go on 9x9 boards. > >I have constructed a graph of the results so far: > > Your work and insight keeps on amazing me. > > If I understand is correctly the playouts are made all from the > root position? > I am very interested in the results if the larger amount of > playouts are not made from the root position, but from the > moment the UCT part is over, and the playouts begin, > especially with larger amount of playouts. > I remember some statement of you that it made no > significant difference if the playouts are multiplied from > that position, instead of the root position, even with > hundreds of playouts... > Do those statements still hold? > > Edward de Grijs. Hi Edward, Thank you for the kind remarks. I count the total play-outs - version 0 does 1024 play-outs period and then the search is stopped cold and a more returned. It makes the program weaker if you just do N play-outs from the leaf position of UCT. But it does not seem to hurt the program at all to not expand the existing CHILDREN of a node, until the parent has been visited, say 100 times. I have not tried anything larger than 100, but if it's weaker with 100, I haven't been able to measure it.This really is huge benefit in memory usage, so I like 100. Of course this means some children will get 2 or more visits before getting expanded.There is no need for this if you have memory to burn. I'm not sure what experiment you are proposing, but at some point it might be interesting to see how other values work. If you have a computer with very little memory to spare, you could probably use much larger numbers although at some point you would surely experience a noticable weakening. If you are proposing doing 2, 3, 4 or more play-outs at the point where you normally do one, I think it strengthens the program - but not enough to justfy the extra work. In other words doing 2 play-outs doubles the amount of time spent on a move and it does not increase the strength enough to justify this. - Don > > > > >From: Don Dailey <[EMAIL PROTECTED]> > >Reply-To: [EMAIL PROTECTED], computer-go > >To: computer-go > >Subject: [computer-go] The physics of Go playing strength. > >Date: Sat, 07 Apr 2007 21:05:19 -0400 > > > >A few weeks ago I announced that I was doing a long term > >scalability study with computer go on 9x9 boards. > > > >I have constructed a graph of the results so far: > > > > http://greencheeks.homelinux.org:8015/~drd/public/study.jpg > > > >Although I am still collecting data, I feel that I have > >enough samples to report some results - although I will > >continue to collect samples for a while. > > > >This study is designed to measure the improvement in > >strength that can be expected with each doubling of computer > >resources. > > > >I'm actually testing 2 programs - both of them UCT style go > >programs, but one of those programs does uniformly random > >play-outs and the other much stronger one is similar to > >Mogo, as documented in one of their papers. > > > >Dave Hillis coined the terminolgoy I will be using, light > >play-outs vs heavy play-outs. > > > >For the study I'm using 12 versions of each program. The > >weakest version starts with 1024 play-outs in order to > >produce a move. The next version doubles this to 2048 > >play-outs, and so on until the 12th version which does 2 > >million (2,097,152) playouts. This is a substantial study > >which has taken weeks so far to get to this point. > > > >Many of the faster programs have played close to 250 games, > >but the highest levels have only played about 80 games so > >far. > > > >The scheduling algorithm is very similar to the one used by > >CGOS. An attempt is made not to waste a lot of time playing > >seriously mis-matched opponents. > > > >The games were rated and the results graphed. You can see > >the result of the graph here (which I also included near the > >top of this message): > > > > http://greencheeks.homelinux.org:8015/~drd/public/study.jpg > > > >The x-axis is the number of doublings starting with 1024 > >play-outs and the y-axis is the ELO rating. > > > >The public domain program GnuGo version 3.7.9 was assigned > >the rating 2000 as a reference point. On CGOS, this program > >has acheived 1801, so in CGOS terms all the ratings are > >about 200 points optimistic. > > > >Feel free to interpret the data any way you please, but here > >are my own observations: > > > > 1. Scalability is almost linear with each doubling. > > > > 2. But there appears to be a very gradual fall-off with > > time - which is what one would expect (ELO > > improvements cannot be infinite so they must be > > approaching some limit.) > > > > 3. The heavy-playout version scales at least as well, > > if not better, than the light play-out ver
Re: [computer-go] MoGo
I'm not really fanatic about this either way. If I knew how to easily "fix" this, I probably would provide a mode to make it work either way. I see it as aesthetically MORE pleasing when the program appears to be playing stupid, but then you realize that YOU are the one that doesn't understand. When I first started studying chess as a little boy, I was completely annoyed that the grandmasters never played the game out but resigned. I realize now that it is not aesthetically pleasing to pretend you don't realize the game is over. I feel this way a little bit here (but not enough to lose any sleep over.) It's almost ugly (to me) for it to pretend to fight for something that doesn't matter - almost childishly. But of course I agree that from a stylistic point of view you can see the cleanup phase as just a continuation of good habit. - Don On Sun, 2007-04-08 at 11:02 -0600, Arend Bayer wrote: > > > On 4/7/07, Don Dailey <[EMAIL PROTECTED]> wrote: > On Sat, 2007-04-07 at 14:36 -0400, Matt Kingston wrote: > > What I want from a commercial go playing program is one that > I can use > > to learn to be a better go player. This brings up two > important > > deficiencies in the "win by 0.5" strategy. If I'm always > loosing by > > half a point, It's difficult for me to see when I'm playing > well and > > when I'm playing poorly. If the computer doesn't exploit my > weaknesses > > because it knows that it will win anyway, then I won't learn > to defend > > properly. I'll never know if I "almost won" or if the > computer was > > just toying with me the whole way. The feedback from "the > computer > > just killed everything" can help me play better. > > Hi Matt, > > I have heard this argument before, but I personally do not > subscribe to it. Please let me explain why. > > Of what learning purpose is it if you are losing the game and > the computer gets you to focus on a dramatic battle somewhere > that is totally irelevant? What are you being taught? I > think you would develop the wrong sense of the game. Instead > of thinking and planning for a win, you would be thinking and > planning for local battles - and as UCT and MC have shown us, > this is the WRONG way to think about the game. > > I am mostly with Matt on this one. Take endgame: If you only ever > fight for every point in the endgame when it actually matters, then > you get very few opportunities to learn a precise endgame. 0.5 leads > before the endgame are just extremely rare unless the players are at > professional level. Also, in a typical 19x19 endgame, when you are > down by 3 points and playing against an amateur, your best chance is > just to fight for every single point and hope your opponent makes 2-3 > small slips that add up to 3-4 points. I.e. in a realistic game, > maximizing your winning percentage is often equivalent to just playing > for a small loss for now; so the UCT all-or-nothing approach is just > almost never the right thing to do for humans (unless the position is > truly hopeless). > > There are also aesthetical reasons why I prefer professional handling > of lost games over UCT's, but that's another email. > > Arend > > > ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] The physics of Go playing strength.
On Sun, 2007-04-08 at 10:09 -0400, [EMAIL PROTECTED] wrote: > The question here is not about UCT(yes, it gaves the same rusults as > alpha-beta). It's about MC scoring. It has not been proved that MC > score will generate the optimum play with large enough simulation. MC is obviously wrong as an evaluation function - it is not guaranteed to return a correct or even a good score no matter how many simulations. However, UCT eventually turns into a pure tree where MC is not a factor.These programs, in theory, will play perfect GO given enough time. - Don > Now the best super computer uses about 500,000 CPUs, which is 2 to the > 18th power of computation increase. Don's curve can be tested to the > number 18 now. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] CGOS
Just a warning - Tomorrow, I'm taking the old CGOS down and replacing it with the NEW cgos. Also, the CGOS test I'm running from my home computer will go away. So the 2 minute server will also go away! The new cgos on boardspace will be 5 minute time control instead of the previous 10 minute. I will save the games on the 2 minute server for a few weeks, so if anyone wants them for any reason - let me know ASAP, because I probably won't keep them around forever. I will archive the games of the old CGOS but I will be taking them off the boardspace server to make room for the new server. If anyone has space and would like to archive all the old CGOS games, let me know. Otherwise, I will keep them for a year or two (longer if it's convenient) and they will be avaiable by request. I've also improved the viewing client a bit. I will make all this available tomorrow. - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Simplified MC eva luator ¿explained?
I will try to explain it better: What had puzzled me until recently is how to combine two facts: a. As Jason says, "A single MC playout corresponds to a Bernoulli trial with probability p" and therefore, p-hat is distributed as a binomial which converges to the normal. b. The playout itself is a stochastic process which has a variance. That variance depends on the length of the playout. These two _facts_ seem contradictory and it is not obvious how to introduce the variance (which is crucial in any MC or even more generally in any stochastic evaluator) in our model. The variance of the stochastic process is not to be mixed up with the distribution of the error of a repeated Bernoulli experiment! The explanation "a", by itself, does not contemplate the variance of the stochastic process. Therefore, I propose a different experiment: Take a Bernoulli experiment of probability: p + N(0, e) (N is some noise of expected value = 0 and standard deviation = e. Of course, trim p + N(0, e) to the interval [0, 1].) For very small e, e.g. e = 1/100, the result of the experiment of p = .7 + N(0, e) is almost the same as the experiment without noise p = .7 For very big e the result of p + N(0, e) is the same as the result for 1/2 That's because the noise is bigger than 1 most of the time, so p + N is 0 or 1 most of the time. In other words: the probability of winning under very noisy conditions does no longer depend much on the initial chances, (the probability conditioned by a move), but it depends on the stochastic process (the playout). And because the playout is balanced its expected value is 1/2. That's why the more important the variance, the more p is biased towards 1/2. Example: If the game could end in a 4 move long playout, and the first move is studied: The probability of a win given a first "good" move is significantly bigger than the probability after a "bad" move. But if the playout is 1000 moves long, the probability of winning after a first "good" move is almost the same as after a first "bad" move -> 1/2. The p + N(0, e) model represents MC much better than the p model. And then I go one step further: When the playouts are uniformly random (=each legal move has the same probability of being selected) and the number of legal moves for each side is balanced, then: Our MC evaluator is a measure of a very trivial rate each node has: the number of sub paths ending in win / the number of all subpaths through that node. I call this quantity the W of a node. W always exists and, if the tree could be searched to its end, it could be computed directly. Because the search cannot be completed it can be estimated by the MC playout. In other words: An (uniformly distributed) MC playout is nothing else but _an estimator of W_. And it is a biased estimator for the reasons explained above. Again: p-hat (= #win / #playouts) is an unbiased estimator of p. p is a biased towards 1/2 "estimator" of W W always exists (= #number of subpaths winning / # subpaths) (Excepts in trivial cases, # subpaths >> astronomical =>W cannot be computed directly.) I hope it is clear this time. For me, it makes so much sense that it becomes obvious when understood, but my way of explaining it is surely improvable. ;-) Jacques. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] The physics of Go playing strength.
I think you are right, because MC score becomes precise when only a few available moves left. However, do you search to the depth of the end of the game, or to the extent that MC score becomes precise? -Original Message- From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: computer-go@computer-go.org Sent: Sun, 8 Apr 2007 2:22 PM Subject: Re: [computer-go] The physics of Go playing strength. On Sun, 2007-04-08 at 10:09 -0400, [EMAIL PROTECTED] wrote: > The question here is not about UCT(yes, it gaves the same rusults as > alpha-beta). It's about MC scoring. It has not been proved that MC > score will generate the optimum play with large enough simulation. MC is obviously wrong as an evaluation function - it is not guaranteed to return a correct or even a good score no matter how many simulations. However, UCT eventually turns into a pure tree where MC is not a factor.These programs, in theory, will play perfect GO given enough time. - Don > Now the best super computer uses about 500,000 CPUs, which is 2 to the > 18th power of computation increase. Don's curve can be tested to the > number 18 now. AOL now offers free email to everyone. Find out more about what's free from AOL at AOL.com. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] The dominance of search (Suzie v. GnuGo)
Don Dailey wrote: I have this idea that perhaps a good evaluation function could replace the play-out portion of the UCT programs. I thought about something similar but only for initializing the counters: introduce 10 fake playouts and estimate the number of wins by a function returning something in [0, 10]. After that, use UCT with real playouts. If your guess was right, that should improve performance, but if it was wrong you are doing nothing irreversible except loosing time. Jacques. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] The dominance of search (Suzie v. GnuGo)
> I have this idea that perhaps a good evaluation function could > replace the play-out portion of the UCT programs. I thought about something similar but only for initializing the counters: introduce 10 fake playouts and estimate the number of wins by a function returning something in [0, 10]. After that, use UCT with real playouts. If your guess was right, that should improve performance, but if it was wrong you are doing nothing irreversible except loosing time. Another option is to replace the playout by a partial playout followed by a static evaluation, which would probably be deterministic (that's acceptable since the partial playout introduced plenty of non-determinism). ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Simplified MC evaluator ¿explained?
On 08/04/07, Jacques Basaldúa <[EMAIL PROTECTED]> wrote: I will try to explain it better: What had puzzled me until recently is how to combine two facts: a. As Jason says, "A single MC playout corresponds to a Bernoulli trial with probability p" and therefore, p-hat is distributed as a binomial which converges to the normal. See also: http://en.wikipedia.org/wiki/Bernoulli_process Regards Martin ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] The physics of Go playing strength.
> A few weeks ago I announced that I was doing a long term > scalability study with computer go on 9x9 boards. > http://greencheeks.homelinux.org:8015/~drd/public/study.jpg Hi Don, This is very interesting. I envy your persistent energy and enthusiasm. If you'd stopped at level 6 or 7 the conclusion could be quite different (they seem to have started to taper off). Are there are any plans to continue with a level using even more playouts? I feel like I am watching a soap opera and now I desperately want to know what happens next week: heavy has started to tail off, but lite has suddenly pulled out of its slump. (Though we could be seeing an anomaly due to the strong versions not playing enough games yet?) I hope you will make the games available for download: I'd like to study the games of the strongest version to see where it is strong and weak. Darren ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] The dominance of search (Suzie v. GnuGo)
> ...XCell Journal. Search on the net for "Lorenz, Donninger, Hydra" and > format "pdf". But in this papers the concept is only mentioned ... Thanks Chrilly. For anyone else interested, it is here: http://www.xilinx.com/publications/xcellonline/xcell_53/xc_pdf/xc_hydra53.pdf But, as you say, the "the search tree as an adaptable error filter"idea is only mentioned in passing. I guess I'll just have to wait for Ulf Lorenz to translate his Dissertation into English :-). > Ulf has used this model for a project to improve the robustness of > airplane-schedules. ... Interesting. It is always motivating to hear about game theory getting applied to the real world. (And having been stuck in Amsterdam airport for 5 hours because KLM "forgot to schedule a pilot" for my flight, I think the airline industry needs all the help it can get!) Darren -- Darren Cook http://dcook.org/mlsn/ (English-Japanese-German-Chinese free dictionary) http://dcook.org/work/ (About me and my work) http://dcook.org/work/charts/ (My flash charting demos) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] The physics of Go playing strength.
On Mon, 2007-04-09 at 09:48 +0900, Darren Cook wrote: > > A few weeks ago I announced that I was doing a long term > > scalability study with computer go on 9x9 boards. > > http://greencheeks.homelinux.org:8015/~drd/public/study.jpg > > Hi Don, > This is very interesting. I envy your persistent energy and enthusiasm. > > If you'd stopped at level 6 or 7 the conclusion could be quite different > (they seem to have started to taper off). Are there are any plans to > continue with a level using even more playouts? I feel like I am > watching a soap opera and now I desperately want to know what happens > next week: heavy has started to tail off, but lite has suddenly pulled > out of its slump. (Though we could be seeing an anomaly due to the > strong versions not playing enough games yet?) > > I hope you will make the games available for download: I'd like to study > the games of the strongest version to see where it is strong and weak. > > Darren I plotted with gnuplot using smooth bezier smoothing. Also, I don't believe there is enough data at the higher levels to give the most accurate curve. Still, you can get the general idea. I have waited a few weeks to get as much data as I have and I will continue to run this for a few more weeks. I have all the games and results available to anyone who wants to analyze it. I cannot run at higher levels, it would just take too long. However I could add another program to the mix. It would be fun to put a strong version of Mogo to form a 3rd curve. - Don > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/