2010/2/18 <dhillism...@netscape.net>

> Ingo,
>
> I'm not a proper statistician, but I believe there's a crucial second step
> that's missing in your analysis of significance. Even if this were the only
> computer-go test that you personally had ever conducted, we would
> nevertheless need to take into account all of the other tests being
> conducted within the community. On any given day, some high number of
> similar tests are carried out by members of this list. They are testing
> different hypotheses to be sure, but that doesn't get us off the hook at
> all.
>
> What it boils down to is this: how frequently does *somebody* get a 95%
> confidence result about *something* that isn't going to hold up under
> further testing? This issue comes up all the time in epidemiology (e.g.
> cancer clusters near power lines), medical studies, bioinformatics, etc..
>


When such results are reported,  it is usually because the experimenter
"felt" that he had a good result.   When the same experimenter has a bad
result and he is motivated to believe that what is trying will work,  he
will probably conclude that the bad result is a result of his running the
experiment incorrectly and try something else.

So what you are going to get is a random sampling of (mostly) good
results.     You can never be sure that the experimenter did not
subconsciously accumulate good results either, tweaking the experiment as he
goes (and throwing out the bad tweaks.)

I'm not suggesting that bad results should be reported and factored in,  but
what should happen is that if someone believes they have found a good
algorithm and have results to report,  the experiment should be repeatable
and needs to be verified by the entire community.      This does not suggest
any dishonesty,  it just needs to be done that way.    I have conducted
experiments myself that returned results well ahead of statistical
significance, only to discover that my setup was flawed.   For instance I
remember one case where the improved version "accidentally" corrected a bug
which was not supposed to be part of the experiment.

I'm not trying to "refute" any of what has been reported,  but I don't see
any science here yet.    I would like to see a serious study based on a
specific proposal or algorithm and I have yet to see that.

Don't forget that when you report results with error bars,  the test length
has to determined in advance.  You cannot just stop the test when you feel
the confidence interval satisfies you,  you have to have determined in
advance that you are going to run N games and then interpret what you see
based on exactly N games.

Don








>
> - Dave Hillis
>
>
>
> -----Original Message-----
> From: "Ingo Althöfer" <3-hirn-ver...@gmx.de>
> To: computer-go@computer-go.org
> Sent: Thu, Feb 18, 2010 7:28 am
> Subject: [computer-go] Re: Dynamic Komi at 9x9 ?
>
> Hello Don,
> several very good points by you!
>
>
> > Does anyone have data based on several thousands games
> > that attempts to measure the effect of dynamic komi?
> > I would like to see results that are statistically meaningful.
>
> I had eight handplayed (4 + 4) games on 19x19 with very
> high handicap, where the version with dynamic komi (rule 42)
> gained a 3-1 score and the version with static komi
> performed 0-4 versus the same opponent. This is evidence
> in the 95% region that the version with dynamic komi is
> not weaker than the static version.
>
> > We need to see a few thousand games played
>
> A few hundreds or even a few dozens may be sufficient when
> the outcome is very clear.
>
> > against a fixed opponent WITH dynamic komi, and
> > then the same program without dyanmic komi playing
> > against the same opponent with the same number
> > of games.   The number of games must be decided before
> > the test is run, or the error margin calculation is
> > meaningless.
>
> I am willing to provide the statistical part, when programmers
> run the experiments.
>
>
> > As far as I can tell, nobody has yet to produce anything more
> > than anecdotal evidence that this works.
>
> I have. See the 4 + 4 games mentioned above,
> played with my "rule 42".
>
> > Having a person manually adjusting this after every game is
> > completely non-sceientific, unless they are doing it in a fixed
> > way with no decision making on their part
>
> Right.
>
> > and they are playing thousands of games (or at least
> > enough to get statistically significant results.)
>
> Right, especially also the bracket part of your sentence.
>
> > I'm not trying to rain on anyone's parade,  but I cannot
> > understand why no one has produced a statistically meaningful
> > result on this subject -
>
> I would have. Unfortunately I am not a programmer, and am also
> not fit in modifying a program code to include dynamic komi.
>
> But, to repeat it, I am willing to do statistical home
> work.
>
> > I am genuinely interested in this since I never was able to
> > make it work when I spent about one intense week on it.
> > (I did not do this with handicap games, but with normal games.)
>
> Your sentence in brackets is crucial. I only proposed to use
> dynamic komi in games with high handicap. Especially I had in
> mind the situation where the stronger side (giving high handicap)
> is MC-based.
>
> Perhaps, 9x9 instead of 19x19 makes it easier for some programmer
> to start test series with dynamic komi.
>
> Ingo.
>
> --
> Sicherer, schneller und einfacher. Die aktuellen Internet-Browser -
> jetzt kostenlos herunterladen! http://portal.gmx.net/de/go/atbrowser
> _______________________________________________
> computer-go mailing 
> listcomputer...@computer-go.orghttp://www.computer-go.org/mailman/listinfo/computer-go/
>
>
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to