> From: Don Dailey <[EMAIL PROTECTED]>
> ...
> > Rémi Coulom wrote:
> > ...
> > Instead of playing UCT bot vs UCT bot, I am thinking about running a
> > scaling experiment against humans on KGS. I'll probably start with 2k,
> > 8k, 16k, and 32k playouts.
> That would be a great experiment. Ther
3 kyu at this level is a lot for a person. I've know club players who never
got better than 9k, and people who study and play may still take a year or
more to make this much improvement.
Many club players stall somewhere between 7k and 4k and never get any
better.
David
> -Original Message-
On Thu, Jan 24, 2008 at 03:19:52PM +0100, Magnus Persson wrote:
>
> As a rule of thumb I want 300 games for each datapoint and at least
> 500 if I am going to make any conclusions.
Ok, I think we start to have those 500 games.
To my eyes, FatMan shows a clear turn in the curve at FatMan_03. Be
I don't feel like searching for it right now, but not too long ago someone posted a link to a chart that gave the winrates and equivalent rankings for different
rating systems.
Don Dailey wrote:
I wish I knew how that translates to win expectancy (ELO rating.)Is
3 kyu at this level a prett
I wish I knew how that translates to win expectancy (ELO rating.)Is
3 kyu at this level a pretty significant improvement?
- Don
Hiroshi Yamashita wrote:
>> Instead of playing UCT bot vs UCT bot, I am thinking about running a
>> scaling experiment against humans on KGS. I'll probably start
We can say with absolute statistical certainty that humans when playing
chess improve steadily with each doubling of time.This is not a
hunch, guess or theory, it's verified by the FACT that we know exactly
how much computers improve with extra time and we also know for sure
that humans play
Hiroshi Yamashita wrote:
>> What are the time controls for the games?
>
> Both are 10 minutes + 30 seconds byo-yomi.
>
> Hiroshi Yamashita
Good. I think that is a good way to test.
- Don
>
>
> ___
> computer-go mailing list
> computer-go@computer
What are the time controls for the games?
Both are 10 minutes + 30 seconds byo-yomi.
Hiroshi Yamashita
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/
What are the time controls for the games?
- Don
Hiroshi Yamashita wrote:
>> Instead of playing UCT bot vs UCT bot, I am thinking about running a
>> scaling experiment against humans on KGS. I'll probably start with
>> 2k, 8k, 16k, and 32k playouts.
>
> I have a result on KGS.
>
> AyaMC 6k (5.9k
Jeff Nowakowski wrote:
> On Tue, 2008-01-29 at 17:41 -0500, Don Dailey wrote:
>
>> This is in response to a few posts about the "self-test" effect in ELO
>> rating tests.
>>
> [...]
>
>> So my assertion is that scalability based on sound principles is more or
>> less universal with per
Rémi Coulom wrote:
> Don Dailey wrote:
>> They seem under-rated to me also. Bayeselo pushes the ratings together
>> because that is apparently a valid initial assumption. With enough
>> games I believe that effect goes away.
>>
>> I could test that theory with some work.Unless there is a
Instead of playing UCT bot vs UCT bot, I am thinking about running a
scaling experiment against humans on KGS. I'll probably start with 2k,
8k, 16k, and 32k playouts.
I have a result on KGS.
AyaMC 6k (5.9k) 16po http://www.gokgs.com/graphPage.jsp?user=AyaMC
AyaMC2 9k (8.4k) 1po http:
He stated that there is a depth limit in FatMan. IMO, it is quite likely that
is the limiting factor.
Jeff Nowakowski wrote:
On Tue, 2008-01-29 at 17:41 -0500, Don Dailey wrote:
This is in response to a few posts about the "self-test" effect in ELO
rating tests.
[...]
So my assertion is th
On Tue, 2008-01-29 at 17:41 -0500, Don Dailey wrote:
> This is in response to a few posts about the "self-test" effect in ELO
> rating tests.
[...]
> So my assertion is that scalability based on sound principles is more or
> less universal with perhaps a small amount of self-play distortion, but
>
Don Dailey wrote:
They seem under-rated to me also. Bayeselo pushes the ratings together
because that is apparently a valid initial assumption. With enough
games I believe that effect goes away.
I could test that theory with some work.Unless there is a way to
turn that off in bayelo (I d
In other tests I've done, bayeselo comes out pretty much the same as an
iterative method I use where players are batch rated over and over until
they converge.
I did a similar thing with the current data and it comes out very
similar to bayeselo, so I have no reason to believe that with a few
hu
Yes, you are right. But gnugo's 1800 rating is the only real point of
reference that I have. As you get farther away from 1800 I believe
it's the case that the "true" rating can be sloppy.
- Don
Sylvain Gelly wrote:
>
> between pairs
> of programs, you can get a more and more conf
This is in response to a few posts about the "self-test" effect in ELO
rating tests.
I'll start by claiming right up front that I don't believe, for certain
types of programs, that this is something we have to worry unduly
about. I'll explain why I feel that way in a moment.
One general obser
On Tue, 29 Jan 2008, Álvaro Begué wrote:
Of course the value of A is the interesting number here. If they are using
anything like the common UCT exploration/exploitation formula, I think we
will see a roof fairly soon. In my own experiments at long time controls,
the program would spend entirely
>
> between pairs
> of programs, you can get a more and more confident belief about
> the actual ELO. so they'll converge to the correct values, and
> should do so reasonably rapidly.
>
You are right. My point was that here we have only 1 fixed rating, which is
very low, and all the higher levels
On Jan 29, 2008 3:18 PM, steve uurtamo <[EMAIL PROTECTED]> wrote:
> > But that's not relevant for this study. All that matters is the 14
> out
> > of 20 total score and the order should not matter one little bit.
> > With players that change, that is very relevant however.
>
> lemme think tha
sorry, of course you're right -- linearity of expectation,
after all.
the total number of games is relevant for the std. dev.,
though. which has settled down for everyone but
the new guys to a quite respectably small number of elo points.
s.
- Original Message
From: Don Dailey <[EMAIL
> But that's not relevant for this study. All that matters is the 14
out
> of 20 total score and the order should not matter one little bit.
> With players that change, that is very relevant however.
lemme think that over.
by the way, attached is what a quadratic fit looks like to the
steve uurtamo wrote:
> it is a good thing to make your prior knowledge completely
> fair (in the sense of not having any bias) when doing bayesian
> calculations. any estimator being used will reshape that
> knowledge on the fly.
>
> the idea is that your prior knowledge of the ELO ranking shoul
it is a good thing to make your prior knowledge completely
fair (in the sense of not having any bias) when doing bayesian
calculations. any estimator being used will reshape that
knowledge on the fly.
the idea is that your prior knowledge of the ELO ranking should
be about the same for every sing
Sylvain, in the download notes, you mention that Mogo has some troubles with
"very long" timescales, due to the low resolution of single floats. Do you have
any estimate of how many simulations would lead to this situation?
Terry McIntyre <[EMAIL PROTECTED]>
- Original Message
From:
They seem under-rated to me also. Bayeselo pushes the ratings together
because that is apparently a valid initial assumption. With enough
games I believe that effect goes away.
I could test that theory with some work.Unless there is a way to
turn that off in bayelo (I don't see it) I could
>
> but not linearly and you can see a nice gradual curve in the plot.
>
> Now we have something we can argue about for weeks. Why is it not
> mostly linear? Could it be the memory issue I just mentioned?
>
Hi Don and all participants to that study, that is very interesting!
The memory constr
No, FatMan is a primitive UCT player, so it builds a tree in memory.
I am not willing to extend it beyond the current level as it is a
resource hog. Mogo is playing at the same strength in far less time.
- Don
Jason House wrote:
>
>
> On Jan 29, 2008 10:01 AM, Don Dailey <[EMAIL PROTECTED
On Jan 29, 2008 10:01 AM, Don Dailey <[EMAIL PROTECTED]> wrote:
> FatMan seems to hit some kind of hard limit rather suddenly. It
> could be an implementation bug or something else - I don't really
> understand this. It's very difficult to test a program for
> scalability since you are lim
I'd be willing to donate some CPU time. I run Ubuntu linux on a P4 3ghz with
1gig RAM. Can be configured with or w/o HT. (usually leave it off)
-Josh
On Jan 29, 2008 10:01 AM, Don Dailey <[EMAIL PROTECTED]> wrote:
> The 9x9 scalability study has been a huge success with 35 cpu's
> participating
The 9x9 scalability study has been a huge success with 35 cpu's
participating and several volunteers.This means we got about a month
of testing done per day and have the equivalent of about a years worth
of data or more.We are considering whether to extend the study a bit
more to test some
32 matches
Mail list logo