To answer several emails in one: Hideki: One question, why the rating of FatMan-1 is not 1800?
They are probably not identical - if one plays much faster than another, it will slightly cripple programs that think on the opponents time. That's the only explanation I can think of. Also, I fixed FatMan to 1800, not FatMan-1 because FatMan has more games. However, I am going to change this to FatMan-1 since it has many more RECENT games. Of course on CGOS they are both fixed at 1800. This will cause a shift in the entire rating pool of course. Hideki: And, I'll be happier if there are links to programs on HOF so that I Hideki: can see their rating pages. This is easy - next version will do this. Rémi: Readpgn games.pgn Rémi: elo Rémi: advantage 0 ;no advantage for playing White Rémi: drawelo 0.01 ;draws are extremely unlikely Rémi: mm Rémi: exactdist Rémi: ratings I'll switch over to this. Rémi: No draw at all would be "drawelo 0", but it generates some numerical Rémi: problems in the algorithm for computing confidence intervals. Ahh! I tried to set this to zero and got bizarre results so I left it alone. Now I know what to do. Rémi: I would be very interested if you could lower your threshold to 197 Rémi: games, and include december results, which would include Crazy Stone in Rémi: the list ;-) I plan to do this at the end of each month, strictly by month by an automated script (it's mostly automated now including a little module to convert the results to a pgn format that bayeselo can read. But until I get this fully automated I will just include all the games available from now on. I think Crazy Stone was at the top before I applied these constraints. I want there to be a "price for admission" to get programs to play lots of games. Rémi: Another idea: in order to give more significance to the results, games Rémi: between different versions of the same program should be excluded. These Rémi: games tend to strongly overestimate the strength of the stronger version. There is no way to do this without adding some more infrastructure. It is impossible to determine with complete confidence which version is which and even if I could, how do I determine which version is the one that should be representing the program family? There are simple heuristics that might work most of the time but are not completely reliable. The convention I once suggested for assigning versions to programs isn't widely followed and isn't enforced. The idea was to version your program like this: "Lazarus-1-0" where everything before they first hyphen was the generic program name and everything after the (first) hyphen is the version number. But a malicious user could kick your program off by using your name (with a different version number.) It's possible to enforce this policy by letting the server impose it - your password applies to the first characters before the hyphen. Even then, how does the server determine which version is representative? When I developed Lazarus I added many experimental versions which didn't make the cut but had more recent version numbers. I would not want those versions to "carry the flag" for the Lazarus family of programs. So the only way to get this working right is to add more infrastructure to CGOS. I think what is better is to provide a way for users to specify that they don't want to play their own program. We could consider a "family" of programs to be specified by password. OR, it might be reasonable to just say that if you don't want your program to play different versions of itself, the password must match as well as all characters before the first hyphen. So essentially you are forced to version with a hyphen if you want this behavior. Does that sound like a reasonable improvement to the system? Rémi Rémi Coulom wrote: > Don Dailey wrote: >> I put up a web page that displays EVERY player who has played at least >> 200 games on CGOS. >> >> It uses the bayeselo program that Rémi authored. >> >> http://cgos.boardspace.net/9x9/hof.html >> >> >> I'm not sure I used the program correctly - it's rather complicated and >> I'm not that great with statistics. If anyone is interested in the >> settings I used I can provide that. >> >> - Don > Another idea: in order to give more significance to the results, games > between different versions of the same program should be excluded. > These games tend to strongly overestimate the strength of the stronger > version. > > Rémi > _______________________________________________ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/