We can say with absolute statistical certainty that humans when playing chess improve steadily with each doubling of time. This is not a hunch, guess or theory, it's verified by the FACT that we know exactly how much computers improve with extra time and we also know for sure that humans play BETTER relative to computers as you add time to the clock, and this holds EVEN up to correspondence chess. So humans have a similar ELO curve to the graph in our study, only it's even better than computers. Again I emphasize, this is not speculation but is clear fact.
That being the case, we have to take care how we construct any experiment involving human vs computer play with GO. GO is a different game, so we don't know if the same rule holds, it could even be just the opposite, perhaps computers play better relative to humans with more thinking time. But the point is that we should not make any assumptions about this. I guess what I'm saying is that if anyone does such an experiment, they must publish the exact conditions of rated games, otherwise the test is completely meaningless. Also, I suggest that such a test is more useful if you keep something constant such as the time-control, or the number of play-outs. Of course if you keep the number of play-outs constant you are testing the scalability of human players! I believe it's more interesting to test the scalability of go programs but at some point we should try to understand both, like we do in chess. Another useful test is to get solid ratings for scalable go programs playing at several different levels, perhaps starting at 1 second per move and moving up to 1 or more minutes per move. Set the time control and let the computer manage it's time. See if giving the human more time makes him play worse against computers. Studies against humans are necessarily messy. Humans have good and bad days, don't play consistently the same and humans vary significantly in their ability to beat computers. So it will be more difficult to get evidence on this but it's worthwhile. I hope Rémi decides to do this study. - Don Rémi Coulom wrote: > Don Dailey wrote: >> They seem under-rated to me also. Bayeselo pushes the ratings together >> because that is apparently a valid initial assumption. With enough >> games I believe that effect goes away. >> >> I could test that theory with some work. Unless there is a way to >> turn that off in bayelo (I don't see it) I could rate them with my own >> program. >> >> Perhaps I will do that test. >> >> - Don > The factor that pushes ratings together is the prior virtual draws > between opponents. You can remove or reduce this factor with the > "prior" command. (before the "mm" command, you can run "prior 0" or > "prior 0.1"). This command indicates the number of virtual draws. If I > remember correctly, the default is 3. You may get convergence problem > if you set the prior to 0 and one player has 100% wins. > > The effect of the prior should vanish as the number of games grows. > But if the winning rate is close to 100%, it may take a lot of games > before the effect of these 3 virtual draws becomes small. It is not > possible to reasonably measure rating differences when the winning > rate is close to 100% anyway. > > Instead of playing UCT bot vs UCT bot, I am thinking about running a > scaling experiment against humans on KGS. I'll probably start with 2k, > 8k, 16k, and 32k playouts. > > Rémi > _______________________________________________ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/