Here is a quick answer to most of your sticky points:

But first of all, thank you for you generous praise, I probably don't
deserve it (but I'll take what I can get :-)

An external tester will test for conformance and it will compare 2 bots,
one of which we "trust" as being conforming.   But the tester will not
be deterministic, it will throw random positions at the bots so that a
black box author cannot present it with something that has hard coded
answers.   Also, I don't want it to be tuned for any given position.  

I envision that you might be able to seed the tester with a random
number in order to duplicate the testing conditions and if this becomes
structured enough the "official" test would use a hidden standard seed.
This would be required so that different programs are not presented with
a different series of tests.  

GTP is pretty much a necessity and is also very much a standard.  It's
necessary for external game playing and testing and for having an
external tester.  It doesn't make sense to produce a different system.
We could make it spit out numbers that you key in to a spreadsheet of
some kind or you could do the math manually, but this becomes unwieldy
if the test is to be very sophisticated.   

In fact, this is the whole "raison d'être" (reason for being) of GTP,
for communicating with programs. 

We can always publish numbers that people can use for informal checking.
But I definitely want some kind of conformance metric that is not just
ad-hoc.  

I really like your idea of massive automated testing to test
conformance,  but you know this is extremely CPU intensive.  It would
take tens or hundreds of thousands of games to be able to say with high
confidence that 2 programs are functionally identical in strength.   So
I envision a primary test that runs relatively quickly and a more
comprehensive test based on game play for the most interesting programs
or for anyone will to take it that far.

- Don





On Tue, 2008-10-14 at 22:17 +0200, Denis fidaali wrote:
> -----------------------
> Don said
> -----------------------
> I don't really want to be too involved in drafting this as it's not my
> forte.  However I hope someone else (who is a better writer) will be
> willing.  
> 
> +++ Answer +++
> +++++++++++
> What you did up now has great value. It gives us a strong basis
> to build up onto. I think that the more it pleases you, the better it
> will be able to serve as this basis. We have to help you
> because it's just to much to ask you to do all the hard work.
> But you can't say that you are not skillful, it doesn't seems
> to be true to me. Making up a draft, all alone, is probably not a task
> anyone in his right minds feels comfortable at. But if enough people
> helps you, it becomes doable :) What i like is that you seem quite
> experienced, and so you easily get to the point, and sort out the
> issues. If things are to be kept on track, we most certainly needs
> someone that is able to do just that. And it seems to me that
> you're sort of great for this :) Witch by no mean should result
> in you doing all the work. Arbitrating what should be, and what
> shouldn't be in the draft seems to be your forte :)
> 
> 
> 
> -----------------------
> Don said
> -----------------------
> > 
> > We'll probably have to get a bit deeper in the gtp-part ultimately.
> 
> Do you mean the explanation or the actual commands I added?   For the
> explanation we could point them to the excellent GTP draft standard and
> then explain this as 2 additional commands and then clarify what I
> wrong.
> 
> I made only 2 GTP command as I wanted to be careful not to add anything
> that would likely impose a "barrier" to implementation and this
> influenced the specification in many places too.  
> 
> +++ Answer 1+++
> ++++++++++++
> The ref-nodes commands returns
> the number of moves that were made during the N simulations ?
> 
> another GTP command setting up N ? 
> 
> So i guess we strongly enforce the number of simulations then ? For example, 
> my N-Thread implementation
> don't quite guarantees that it won't do a bit more simulations than it should.
> Still for most N big enough, it doesn't really change the average results ..
> but if you get the number of moves, with the aim of dividing it by exactly N,
> it won't gives the right result ... unless i calculate what the number of 
> moves
> should have been :)
> What i like with the average number of moves, is that it gives a strong 
> evidence
> that the implementation is correct. But i don't really see that as being an
> absolute needs for the black-box testing.
> 
> I propose that we use both the match against the reference bot (50% win/loss 
> being the right value there
> for say 2000 games ... the number of simulation unspecified there - but same 
> for reference and to-be-tested bots)
> Then to make sure, we also should have a command for knowing the value of 
> each legal moves of a position.
> If the bot pass those two tests, with all sound parameters, then it is valid. 
> The implementation can't use
> fancy tricks to get those two tests rights. Because we can come up with an 
> infinity of position to score ..
> and we can use different values for the number of simulations. So it gives 
> both a lot of freedom on how
> you implement it, and a really strong way of getting sure if a bot is conform.
> 
> 
> +++ Answer 2+++
> ++++++++++++
> I really like the way you try (we all try) to zoom-in
> onto what's really the most meaningful. I think
> that we almost have it right now. As far that the 
> board implementation is concerned (and the montecarlo & AMAF).
> 
> But GTP is also an important part. I would really have liked
> not to have to take GTP into account :) But my bots wouldn't
> have been able to interact with the outside world then.
> Still i think we have to make it as easy as possible. I guess
> it's probably sound to just point to the GTP specification. 
> Personally, i feel quite uncomfortable with asking someone
> to have to read the GTP specification. Picking-up all by himself
> witch commands to implements, and witch are not worthy.
> I think that nearly as much time will be spent on the GTP part,
> than on the first light-simulation part. I feel uncomfortable with that.
> 
> My personal feeling is that it would be best to provide an intermediate
> layer more friendly to simple AMAF-only bots. That would mean that
> the protocol part would be made as simple as it can. It would be consistent
> with what we are trying to do with the AMAF-decision part ... But it means
> a lot more work from us. For one things, we would have to provide a translator
> that would act as an intermediate between the simple-AMAF implementation
> and a true GTP protocol implementation. I think Java would enable to get 
> easily
> a cross platform translator to encapsulate the target simple-AMAF program.
> But then we would also have to devise a protocol that is indeed more easy
> to implement, and more natural for the simple-AMAF implementation. I have
> been thinking of that for a long time. And i may have ideas. But once again,
> it means quite some work and effort.
> 
> (GTP is now hidden to me by layers and layers of helpers)
> here is the page i used : 
> http://www.lysator.liu.se/~gunnar/gtp/gtp2-spec-draft2/gtp2-spec.html
> 
> 
> -----------------------
> Don said
> -----------------------
> I think anyone wishing to participate (officially) should submit their
> source code.   You can post numbers but it cannot be taken seriously
> without code and verification.  Perhaps we could require that AT LEAST
> the executable be submitted?
> 
> Right now my implementation is the reference implementation but I would
> consider a better written one.   The criteria for "better" is short,
> simple and readable,  not high performing or feature-rich.   
> - Don
> 
> +++ Answer +++
> ++++++++++++
> I like the black-box idea.
> First, i think it forces us to focus on what really matters.
> 
> I think that we should provide an easy way for people to submit their codes. 
> (easier to say than to provide :) )
> I think indeed, that we should provide an easier way still for them to 
> provide (probably more manageable)
> the executable.
> And ultimately i think we should propose an online way of submitting a 
> solution. Pretty like CGOS is online.
> CGOS allows people to test programs that are still on distant machines. I 
> guess that's what gives the most
> freedom. Potentially if someone want to be paranoid enough about his code 
> that he don't want to publish it,
> he may also be reluctant to give up the executable ... (it's so easy to 
> disassemble some languages ..).
> I think a server for validation, that you could connect to like you do with 
> CGOS would be neat :)
> 
> 
> -----------------------
> Don said
> -----------------------
> It's not clear to me whether others will respond, but it would be pretty
> cool and it would probably grow the computer go community at the same
> time.
> 
> +++ Answer +++
> ++++++++++++
> I think that this subject may well be the most important. It is to me.
> As peter drucker said so well : there is nothing more sad, than to
> do really well, something that do not to be done at all.
> 
> Up until now, we mostly have tried to give a nice model for something
> that people who get's naturally involved in computer-go can't really
> avoid to do. So this specification has it's value. Anyone really willing
> to do something concrete, and back up it's intuition with something
> consistent will have to follow it, at least once. And it'll definitely help
> this person to have a consistent frame to sustain his early efforts.
> It'll give a least a base for him to ask some interesting questions.
> 
> Still even so, he'll not automatically be aware that this material exists.
> So here we come across one thing very important, and it's visibility.
> The more satisfying the draft will be, the more he'll get some natural
> visibility : people will point to it. I think it'll be better to minimize the 
> proliferation of external links, when it is possible. Of course the basic
> principles of go, are probably explained very well somewhere, and
> it could be useful to point to that, for those who just never played it. 
> But as far as the mechanism of a simple go-play-out is concerned
> i find it myself easier to read if it is an autonomous document.
> 
>  I guess the best we can do for now, is try to tell what we would like, and 
> let
> you, Don, sort it all out for us :) And hope that somehow, things will 
> work-out
> At least, if they don't, i hope we'll all have had some fun. That doesn't mean
> that you should be the only one to do all the redaction effort.
> 
> 
> I think that this dynamic of seeing the draft evolve slowly as suggestions
> are accepted and rejected, is fascinating. And i like the way you did it
> so far :)
> 
> 
> 
> 
> 
> ******************************
> Here is what i'd like to aim for :
> ******************************
> people, 
> young, passionates, geeks, casual developers,
> language advocates,
> can easily get involved and interested in go
> programming by answering to both an interesting
> and not to over-complicated&time-consuming challenge.
> 
> It's has to be satisfying somehow, both for elite
> developers, and for more casual ones.
> 
> Ultimately, anyone should be able to pass the validity
> tests within 5 days of works, 3 hours a day.
> 
> We then could have much better arguments when asking
> people to back up their saying with simple experiments :)
> 
> I hope that once we got the AMAF-challenge up and running,
> we can then (only then) propose some others (with reduced costs for us :) )
> 
> ***********************************************
> Here is how i got involved into go programming :
> ***********************************************
> As i recall it, i was once approached by a guy named Ivan. He was really
> excited about how monteCarlo and UCT (ie mogo) was pulling up
> the level of play. With minimal efforts, all of a sudden, anyone could
> build up a strong-go program. That was really a revolution. Since i
> played go, i had dreamed to make go engines. But it just seemed 
> to be too much work. Unsatisfying. So i delayed. But i still dreamed
> of it. So when ivan came, we discussed a bit the principle and the 
> results. We took a week to come up with the basis of a really
> simple MonteCarlo program. He had access to this list, but at this
> time i didn't. But i didn't really needs it, because by just knowing
> the rules of play, it was all so straight forward.
> 
>  I recall that we got the basic support for simulations (ie board) up
> and running within two days of work. We learned later that there was
> many bugs, and we also learned to optimize the code a bit
> (up to 3 to 4 time faster i think). We where both excited with
> having a fast board implementation. Although we chose Java.
> (All the try i have made with C, gave performances very very close
> to the java implementation, using the same structures).
> 
>  Apparently, we naturally evolved all the principle that just everybody
> seems to use. It's fascinating to realize that there is one or two
> natural ways of implementing a simple (fast) board - algorithm-wise
> The existence of this community has been great support to us. In
> particular the CGOS feature, and KGS, gave great target, to be motivated.
> We then could watch our work interacting with the rest of the world. I think
> we wouldn't have gotten involved without this. I'd probably would be working
> at an hex bot, if it was not for this strong community there. I have been
> playing go from time to time for years. And i don't play Hex. But Hex seems
> so much faster to run tests on :) The development and debugging cycle
> looks like it would be shorter too.
> 
> _________________________________________________________________
> Email envoyé avec Windows Live Hotmail. Dites adieux aux spam et virus, 
> passez à Hotmail ! C'est gratuit !
> http://www.windowslive.fr/hotmail/default.asp_______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to