On Mon, Nov 26, 2012 at 3:46 PM, Mark Boon <[email protected]> wrote:
> I would imagine that "practical significance" depends on the absolute > level. Between two beginners 23% means little as it can be overtaken > by a day's study. Between top-professionals it probably means the > difference between a legendary 9p winning many top-title tournaments > and a 9p who never wins a top title in his life. > True, but you are making a statement about the stability of the rating or strength of the player, I am assuming a reliable and stable rating difference, not an ELO guess. You would not be able to claim a practical superiority over anyone if you only played 5 or 10 games in your life as a beginner might. But I do get your point - it's a matter of perception in the case of a long time pro but in the case of a beginner, even if the superiority is the same it is much more subject to change over time. Another way to see this, is that if you are are a 40 year old 2 Dan player and you have even chances against an 8 year old prodigy, he is already going to perceived as the superior player because he surely will be within a few weeks or months. Don > > Mark > > On Mon, Nov 26, 2012 at 4:13 AM, Don Dailey <[email protected]> wrote: > > > > > > On Mon, Nov 26, 2012 at 4:05 AM, "Ingo Althöfer" <[email protected]> > > wrote: > >> > >> One general comment: > >> > >> Ratings are not transitive. For instance, > >> A1 may score 25 % against B, > >> and A2 may score 22 % against B. > >> Then it can not be concluded that A1 will score more than 50 % > >> in direct duel with A2. > >> > >> It is rather easy it construct triples of "semi-simple" agents A, B, C > >> for some "normal" game where > >> A score 95+ percent against B, > >> B scores 95+ percent against C, > >> C scores 95+ percent against A. > > > > > > Hi Ingo, > > > > The ELO system which tries to model game playing skill mathematically > makes > > some assumptions that are not completely true, but are approximations to > > the reality. One assumption made by the ELO system is that skill IS > > transitive. It works quite well because in practice human skill and > > program skill is nearly transitive. So it has proven to be a very > good > > model indeed. > > > > As you say it is not difficult to artificially construct classes of > players > > who do not have transitive relationships between each other. One very > > simple way to do this is to take 3 equal players, and give them each a > > different opening book such that the book will get them quickly into > losing > > or winning situations against each other. You can create your own > > "rocks/paper/scissors" non-transitive relationship this way. > > > > You can also do it with the playing algorithm but it's a bit more > difficult > > but certainly possible. You give one program a serious weakness that > one > > of the other 2 can easily exploit but that the other program cannot > exploit > > - so each program has a unique exploitable weakness that only one of the > > other 2 programs can exploit. > > > > Don > > > > > > > >> > >> > >> Ingo. > >> > >> -------- Original-Nachricht -------- > >> > Datum: Sun, 25 Nov 2012 17:03:33 -0800 > >> > Von: Leandro Marcolino <[email protected]> > >> > An: [email protected] > >> > Betreff: [Computer-go] Practical significance? > >> > >> > Hello all!.. > >> > > >> > I am currently doing a research about Computer Go. I can't tell the > >> > details > >> > about it yet, but I will post them here after (if) my paper is > >> > accepted... > >> > > >> > In my research I compare many systems (An), playing against a fixed > >> > strong > >> > adversary (B). So A1 would have a percentage of victory x1 against B, > >> > while > >> > A2 would have a percentage of victory x2, etc... Then I compare the > >> > percentage of victories, and for most cases I can show that one system > >> > is > >> > better than another with 95% of confidence. However, my adviser is > >> > asking > >> > me about not only the STATISTICAL significance of the results, but > also > >> > the > >> > PRACTICAL significance of them. I mean, if one system is, for example > >> > only > >> > 1% better than another, with 99% of confidence, the result would have > a > >> > statistical significance, but wouldn't really matter in a practical > >> > sense. > >> > > >> > In my case, the difference between the systems can range from about 4% > >> > to > >> > about 23%. Doesn't seem to be enough to argue that one system would be > >> > one-handicap stone better than another. But what would be the minimum > >> > difference for me to argue that one system is significantly better > than > >> > another, in a practical sense? (or they are not, in the end?..) Would > >> > calculating ELO-ratings help me in answering this question? > >> > > >> > I think it gets even more complex if we think that, let's say, > changing > >> > the > >> > percentage of victory from 95% to 100% seems to be much more > significant > >> > (in a practical sense) than changing from 30% to 35%, even though the > >> > difference between the two systems is still only 5%. In my case, I am > >> > dealing with percentages of victories that range from around 30% to > >> > around > >> > 53%. > >> > > >> > What do you guys think?.. > >> > > >> > Thanks for your help!.. > >> > > >> > Regards, > >> > Leandro > >> _______________________________________________ > >> Computer-go mailing list > >> [email protected] > >> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go > > > > > > > > _______________________________________________ > > Computer-go mailing list > > [email protected] > > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go > _______________________________________________ > Computer-go mailing list > [email protected] > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go >
_______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
