2010/2/18 <dhillism...@netscape.net> > Ingo, > > I'm not a proper statistician, but I believe there's a crucial second step > that's missing in your analysis of significance. Even if this were the only > computer-go test that you personally had ever conducted, we would > nevertheless need to take into account all of the other tests being > conducted within the community. On any given day, some high number of > similar tests are carried out by members of this list. They are testing > different hypotheses to be sure, but that doesn't get us off the hook at > all. > > What it boils down to is this: how frequently does *somebody* get a 95% > confidence result about *something* that isn't going to hold up under > further testing? This issue comes up all the time in epidemiology (e.g. > cancer clusters near power lines), medical studies, bioinformatics, etc.. >
When such results are reported, it is usually because the experimenter "felt" that he had a good result. When the same experimenter has a bad result and he is motivated to believe that what is trying will work, he will probably conclude that the bad result is a result of his running the experiment incorrectly and try something else. So what you are going to get is a random sampling of (mostly) good results. You can never be sure that the experimenter did not subconsciously accumulate good results either, tweaking the experiment as he goes (and throwing out the bad tweaks.) I'm not suggesting that bad results should be reported and factored in, but what should happen is that if someone believes they have found a good algorithm and have results to report, the experiment should be repeatable and needs to be verified by the entire community. This does not suggest any dishonesty, it just needs to be done that way. I have conducted experiments myself that returned results well ahead of statistical significance, only to discover that my setup was flawed. For instance I remember one case where the improved version "accidentally" corrected a bug which was not supposed to be part of the experiment. I'm not trying to "refute" any of what has been reported, but I don't see any science here yet. I would like to see a serious study based on a specific proposal or algorithm and I have yet to see that. Don't forget that when you report results with error bars, the test length has to determined in advance. You cannot just stop the test when you feel the confidence interval satisfies you, you have to have determined in advance that you are going to run N games and then interpret what you see based on exactly N games. Don > > - Dave Hillis > > > > -----Original Message----- > From: "Ingo Althöfer" <3-hirn-ver...@gmx.de> > To: computer-go@computer-go.org > Sent: Thu, Feb 18, 2010 7:28 am > Subject: [computer-go] Re: Dynamic Komi at 9x9 ? > > Hello Don, > several very good points by you! > > > > Does anyone have data based on several thousands games > > that attempts to measure the effect of dynamic komi? > > I would like to see results that are statistically meaningful. > > I had eight handplayed (4 + 4) games on 19x19 with very > high handicap, where the version with dynamic komi (rule 42) > gained a 3-1 score and the version with static komi > performed 0-4 versus the same opponent. This is evidence > in the 95% region that the version with dynamic komi is > not weaker than the static version. > > > We need to see a few thousand games played > > A few hundreds or even a few dozens may be sufficient when > the outcome is very clear. > > > against a fixed opponent WITH dynamic komi, and > > then the same program without dyanmic komi playing > > against the same opponent with the same number > > of games. The number of games must be decided before > > the test is run, or the error margin calculation is > > meaningless. > > I am willing to provide the statistical part, when programmers > run the experiments. > > > > As far as I can tell, nobody has yet to produce anything more > > than anecdotal evidence that this works. > > I have. See the 4 + 4 games mentioned above, > played with my "rule 42". > > > Having a person manually adjusting this after every game is > > completely non-sceientific, unless they are doing it in a fixed > > way with no decision making on their part > > Right. > > > and they are playing thousands of games (or at least > > enough to get statistically significant results.) > > Right, especially also the bracket part of your sentence. > > > I'm not trying to rain on anyone's parade, but I cannot > > understand why no one has produced a statistically meaningful > > result on this subject - > > I would have. Unfortunately I am not a programmer, and am also > not fit in modifying a program code to include dynamic komi. > > But, to repeat it, I am willing to do statistical home > work. > > > I am genuinely interested in this since I never was able to > > make it work when I spent about one intense week on it. > > (I did not do this with handicap games, but with normal games.) > > Your sentence in brackets is crucial. I only proposed to use > dynamic komi in games with high handicap. Especially I had in > mind the situation where the stronger side (giving high handicap) > is MC-based. > > Perhaps, 9x9 instead of 19x19 makes it easier for some programmer > to start test series with dynamic komi. > > Ingo. > > -- > Sicherer, schneller und einfacher. Die aktuellen Internet-Browser - > jetzt kostenlos herunterladen! http://portal.gmx.net/de/go/atbrowser > _______________________________________________ > computer-go mailing > listcomputer...@computer-go.orghttp://www.computer-go.org/mailman/listinfo/computer-go/ > > > _______________________________________________ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ >
_______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/