Re: [computer-go] From zero to playing on CGOS in 10 minutes

Don Dailey Thu, 23 Oct 2008 09:29:23 -0700

On Thu, 2008-10-23 at 13:33 -0200, Mark Boon wrote:
>       Don,
> 
> You're probably right and I'm misunderstanding how it's supposed to  
> work.
> 
> Let me quote te original description:
> 
>    6.  Scoring for game play uses AMAF - all moves as first.  In the
>        play-outs, statistics are taken on moves played during the
>        play-outs.  Statistics are taken only on moves that are played by
>        the side to move, and only if the move in question is being
>        played for the first time in the play-out (by either side.)  A
>        win/loss record is kept for these moves.
> 
> Somehow I had constructed this to mean you're building a tree, where you
> only gather statistics when you enter a unique position. (Which is  
> basically what would
> happen with UCT search expansion.) If this is not what it means I'm
> totally in the dark as to what exactly this section does mean.

Let me give you a simple example where we set the level to a measly 2
playouts:

So you play 2 "unformly random games" that go like the following.  This
is a nonsense game that probably has illegal moves, but I just made up
the moves for the sake of example:  

Black to move:

c3 d4 g5 f6 b2 g7 c3 d4 g7 pass pass  - black wins
d4 b2 g7 c6 g5 a5 c3 d5 c7 pass pass  - black loses 

In the first game black played:
c3, g5, b2, c3, d4   
...  but we cannot count d4 because white played it first.
.... and we can only count c3 once because black played it twice.

So our statistics so far for black is:

  c3 -  1 win  1 game
  g5 -  1 win  1 game
  b2 -  1 win  1 game

In the second game black played:

 d4, g7, g5, c3, c7  - and black played these moves first but lost.

If you combine the statistics you get:

   c3   1 win  2 games    0.50
   g5   1 win  2 games    0.50
   b2   1 win  1 game     1.00
   d4   0 wins 1 game     0.00
   g7   0 wins 1 game     0.00
   c7   0 wins 1 game     0.00

The highest scoring move is b2 so that is what is played.  

It's called AMAF because we only care if black played the move, we don't
care WHEN he played it.   So AMAF "cheats" by viewing a single game as
several separate games to get many data points from 1 game instead of
just 1 data point.  

Please note that we don't care what WHITE did, we only take statistics
on the moves black made (although we ignore any move that black didn't
play first.)   So if white plays e5 and later BLACK plays e5 (presumably
after a capture) then we cannot count e5 because white played it first.

I think if you study my example you will understand.  

> 
> When you say you count all the playouts starting from an empty board,  
> then
> I have no idea how our outcome can be different by 3-4 moves, which is
> coincidentally the average depth of a uniform tree of 1,000,000 moves  
> on a 9x9 board.

Well we are doing 2 completely different things - so it is surprising to
me that we actually came that close.

> 
> When I run the Orego playout benchmark(without mercy rule)  for  
> 1,000,000 playouts,
>   I get an average game-length of 114.5. This is the same number I'm  
> getting.

You are probably doing what Orego does, not what my test specifies.  

I have to say I am surprised that you read all of that into my spec.  I
never once mentioned UCT, never gave a formula for UCT or anything of
the such.   How was it that you thought you were conforming without even
agreeing on a UCT formula or whether the tree nodes are hashed or
generated like a linked tree?

> 
> With your explanation I think I'm getting a bit closer to  
> understanding what your
> reference implementation does.  But I'd like to see some more clarity  
> before I'll
> attempt to implement it. Especially criteria #6 quoted above and #3  
> need some
> disambiguation (if that's a word).

I think that's a word, I've seen it in the SAN chess document for Chess
for how to disambiguate chess moves that have the same destination
square and such.  

Anyway,  I am not a good writer and had hoped that someone who is better
would help me make it more clear.   But at least I have a reference
implementation with source code.   

When I get farther along and create a web page, I will include my
example above or something like it if it isn't too confusing and try to
improve it if nobody steps forward to help me.

To me it seems perfectly clear and understandable,  but since I wrote it
there is no possible way to be objective about it.   Of course I know
that a non-go programmer or even a go programmer with no Monte Carlo Go
experience would have fits with it so I appreciate that it can probably
be greatly improved.   

I don't want it to be too stiff and formal, I want it to be easy for a
casual hacker to understand.   Of course I don't mind if someone creates
a more formal description in addition to one that is easier to read.

> 
> For example, in #3 I understand that the ending criteria is 2  
> consecutive passes or the game-length exceeds N*N*3
> where N is the board-size. The bit about 'at least 1 move gets  
> tried...' is not clear to me.
> I think it should read:
> 
> Play-outs stop after 2 consecutive pass moves, OR when N*N*3 moves  
> have been completed (where N is the size of the board). Except that  
> at least 1 move gets tried.
> 
> But that still leaves me the question how you can try at least one  
> move when no legal moves are left?

Your explanation sounds perfect.  The pass move is always legal but a
pass move is generated ONLY when no other non-eye filling legal move is
possible.  

The reason why 1 move should be tried is that otherwise it is not
possible to gather statistics.  

> 
> You may also need to include an exact defintion of AMAF.

Even the definition I use is better than what I've seen in the papers,
but I would strive to improve the clarity.  

- Don

> Mark
> 
> 
> 
> 
> On 23-okt-08, at 11:19, Don Dailey wrote:
> 
> > On Thu, 2008-10-23 at 09:38 -0200, Mark Boon wrote:
> >> Don,
> >>
> >> I have figured out the discrepancy in the average game length. As
> >> playout length I count from the start of the game, which gives me
> >> 114-115. I believe you count from the starting position where the
> >> playout starts. Because when I modify my code to do that I also get
> >> 111 moves per game on average.
> >>
> >> But I believe counting from the start is more meaningful if you're
> >> going to use the game-length as a reference number. Because otherwise
> >> you might get a different number depending on how explorative your
> >> search is. If your search tends to explore deeper lines in favour of
> >> wider lines you get a different number. Or at least you'll have to
> >> include the exploration-factor K in your UCT tree as a criteria.
> >
> > I think you are misunderstanding how this works.   I am defining a  
> > very
> > simple non UCT bot and you are using terminology that implies  
> > something
> > different that I am specifying.
> >
> > My timing is based solely on the opening position so how would it come
> > out any different?   We are both counting from the start of the  
> > game in
> > this specific case.   (I'll make sure my spec makes this unambiguous.)
> >
> > You also talk about how "explorative" your "search" is.   The spec
> > doesn't talk about a search or exploration,  you just do 1 million
> > uniformly random playouts from some specified position (the starting
> > position.)
> >
> > The score should be the total black wins divided by the total  
> > number of
> > games played and for this specific test it should be 0.5 komi.    I  
> > will
> > build a more sophisticated test soon that you simply run your bot
> > against.
> >
> > The whole idea of this (for me) was to create the simplest  
> > reasonable go
> > benchmark that played a "reasonable" game of go (depending on your
> > definition of reasonable) so that hopefully many people might be  
> > willing
> > to go to the trouble to implement it.   I'm not going for a
> > sophisticated UCT tree based algorithm, because that would require a
> > considerably larger amount of effort for people to write,  and it  
> > would
> > also be significantly more difficult to define precisely.
> >
> > Having said that, you could easily take a UCT program and with very
> > little effort have a "simple" mode that does what we are asking.
> >
> > The simple bot also serves as a stepping stone for beginners to start
> > implementing their own more capable bot because they can be confident
> > that they got the details correct and bug-free (more or less.)   So  
> > you
> > could grow a UCT program from this.
> >
> > - Don
> >
> >
> >
> >
> >
> >
> >>
> >> Mark
> >>
> >>
> >> On 22-okt-08, at 20:57, Don Dailey wrote:
> >>
> >>> On Wed, 2008-10-22 at 20:29 -0200, Mark Boon wrote:
> >>>> On Wed, Oct 22, 2008 at 6:07 PM, Don Dailey <[EMAIL PROTECTED]>  
> >>>> wrote:
> >>>>>
> >>>>> For one thing,  komi is different.   I used 0.5 for running this
> >>>>> test.
> >>>>>
> >>>>> I would have use 0.0  but some implementations don't like even
> >>>>> komi's.
> >>>>>
> >>>>
> >>>> But the komi should have no effect on the playout length. I started
> >>>> out with 103 moves, but that was because of a mercy-rule.  
> >>>> Without it
> >>>> it's 114. You have 111 rather consistently. And I assume you don't
> >>>> use
> >>>> a mercy-rule. Nor super-ko checking, is that correct?
> >>>
> >>> The score is what I was looking at, but you are right about the
> >>> play-out
> >>> length because the play-outs don't care what komi is.
> >>>
> >>> Do you use the 3X rule?  Have you checked the other specs?
> >>>
> >>> My guess is that your playouts are not uniformly random - that is  
> >>> very
> >>> easy to get wrong.   Of course that is just a wild guess, it may  
> >>> very
> >>> well be something completely different.
> >>>
> >>> - Don
> >>>
> >>>
> >>>
> >>>>
> >>>> Another thing I have never looked at is AMAF. But that also  
> >>>> shouldn't
> >>>> affect playout length I assume.
> >>>>
> >>>> By the way, thanks for all the pointers to 'git from everyone.'  
> >>>> It's
> >>>> new to me and at first glance the specs look good, so I'll  
> >>>> definitely
> >>>> give it a go.
> >>>>
> >>>> Mark
> >>
> >> _______________________________________________
> >> computer-go mailing list
> >> computer-go@computer-go.org
> >> http://www.computer-go.org/mailman/listinfo/computer-go/
> > _______________________________________________
> > computer-go mailing list
> > computer-go@computer-go.org
> > http://www.computer-go.org/mailman/listinfo/computer-go/
> 
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/

signature.asc
Description: This is a digitally signed message part

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] From zero to playing on CGOS in 10 minutes

Reply via email to