We can say with absolute statistical certainty that humans when playing
chess improve steadily with each doubling of time.    This is not a
hunch, guess or theory,  it's verified by the FACT that we know exactly
how much computers improve with extra time and we also know for sure
that humans  play BETTER relative to computers as you add time to the
clock,  and this holds EVEN up to  correspondence chess.      So humans
have a similar ELO curve to the graph in our study,  only it's even
better than computers.    Again I emphasize, this is not speculation but
is clear fact.

That being the case,  we have to take care how we construct any
experiment involving human vs computer play with GO.    GO is a
different game, so we don't know if the same rule holds, it could even
be just the opposite, perhaps computers play better relative to humans
with more thinking time.    But the point is that we should not  make
any assumptions about this.

I guess what I'm saying is that if anyone does such an experiment,  they
must publish the exact conditions of rated  games, otherwise the test is
completely meaningless.

Also, I suggest that such a test is more useful if you keep something
constant such as the time-control,   or the number of play-outs.    Of
course if you keep the number of play-outs constant you are testing the
scalability of human players!     I believe it's more interesting to
test the scalability of go programs but at some point we should try to
understand both, like we do in chess.

Another useful test is to get solid ratings for scalable go programs
playing at several different levels,  perhaps starting at 1 second per
move and moving up to 1 or more minutes per move.   Set the time control
and let the computer manage it's time.     See if giving the human more
time makes him play worse against computers.

Studies against humans are necessarily messy.    Humans have good and
bad days,   don't play consistently the same and humans vary
significantly in their ability to beat computers.   So it will be more
difficult to get evidence on this but it's worthwhile.     I hope Rémi
decides to do this study.

- Don



Rémi Coulom wrote:
> Don Dailey wrote:
>> They seem under-rated to me also.   Bayeselo pushes the ratings together
>> because that is apparently a valid initial assumption.   With enough
>> games I believe that effect goes away.
>>
>> I could test that theory with some work.    Unless there is a way to
>> turn that off in bayelo (I don't see it) I could rate them with my own
>> program.
>>
>> Perhaps I will do that test.
>>
>> - Don
> The factor that pushes ratings together is the prior virtual draws
> between opponents. You can remove or reduce this factor with the 
> "prior" command. (before the "mm" command, you can run "prior 0" or
> "prior 0.1"). This command indicates the number of virtual draws. If I
> remember correctly, the default is 3. You may get convergence problem
> if you set the prior to 0 and one player has 100% wins.
>
> The effect of the prior should vanish as the number of games grows.
> But if the winning rate is close to 100%, it may take a lot of games
> before the effect of these 3 virtual draws becomes small. It is not
> possible to reasonably measure rating differences when the winning
> rate is close to 100% anyway.
>
> Instead of playing UCT bot vs UCT bot, I am thinking about running a
> scaling experiment against humans on KGS. I'll probably start with 2k,
> 8k, 16k, and 32k playouts.
>
> Rémi
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to