Re: endgame (Was [computer-go] Re: Should 9x9 komi be 8.0 ?])

2008-03-10 Thread Petr Baudis
On Mon, Mar 10, 2008 at 02:33:03AM -0400, Michael Williams wrote:
> Jonas Kahn wrote:
>> out, kos can go on for long. I don't know what depth is attained in the
>> tree (by the way, I would really like to know), but I doubt it is that
>
> MoGo displays the depth of the principle variation in the stderr stream.

I have been wondering, does that include _any_ nodes, or only these
above certain number of playouts?  What is the playout threshold?

-- 
Petr "Pasky" Baudis
Whatever you can do, or dream you can, begin it.
Boldness has genius, power, and magic in it.-- J. W. von Goethe
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: endgame (Was [computer-go] Re: Should 9x9 komi be 8.0 ?])

2008-03-10 Thread Christoph Birk

On Mon, 10 Mar 2008, Petr Baudis wrote:

MoGo displays the depth of the principle variation in the stderr stream.


I have been wondering, does that include _any_ nodes, or only these
above certain number of playouts?  What is the playout threshold?


The 'principal variation' is usually the one that the program would
play against itself; at each level the one move with the highest
score with might (depending on the program) just be the one with
the most playouts.

Christoph

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: endgame (Was [computer-go] Re: Should 9x9 komi be 8.0 ?])

2008-03-10 Thread Jonas Kahn
On Mon, Mar 10, 2008 at 01:03:02PM -0700, Christoph Birk wrote:
> On Mon, 10 Mar 2008, Petr Baudis wrote:
>>> MoGo displays the depth of the principle variation in the stderr stream.
>>
>> I have been wondering, does that include _any_ nodes, or only these
>> above certain number of playouts?  What is the playout threshold?
>
> The 'principal variation' is usually the one that the program would
> play against itself; at each level the one move with the highest
> score with might (depending on the program) just be the one with
> the most playouts.

I guess Petr meant: does Mogo stop what it calls principal variation
when all nodes following the move have been visited less than say 100 or
1000 times, or does it follow the ``best'' moves until the very leaves
of the UCT tree ?

Jonas
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Optimal explore rates for plain UCT

2008-03-10 Thread Petr Baudis
  Hi,

On Sat, Mar 08, 2008 at 10:18:34AM +0100, Petr Baudis wrote:
>   (By the way, pachi1-*-light are UCT bots with completely light
> playouts with various UCB1 c values, if anyone wants to use that as
> reference. Surprisingly, it seems that my heavy playouts do not make big
> difference so far, though the rating is still very unstable.)

  after two days of play, it seems the ratings are fairly settled now.
For clarity, here is the UCB1 formula I use:

UCB1 = X_i + sqrt(log(N) * c / n)

Specifically, the c is withing the sqrt(); some of the papers put it in
front of the sqrt.

  Also, I expand UCT leaves at the second hit. This retains conservative
memory usage but it is important for strength - I saw huge strength
increase when I lowered this to 2 from the original value of 5.

  With 110k playouts per move and no domain knowledge in the playouts,
the ratings are now:

c=0.2  (pachi1-p0.2-light)  ELO 1627 (285 games)
c=1.0  (pachi1-p1.0-light)  ELO 1590 (120 games)
c=0.05 (pachi1-p0.05-light) ELO 1531 (286 games)
c=2.0  (pachi1-p2.0-light)  ELO 1511 (118 games)

  The main two messages of this post are: If you are developing own UCT
bot, with this number of playouts you should be aiming at least at 1600
ELO on CGOS. And choosing the right c can easily make a 100 ELO
difference! In particular, the "default" UCB1 c=2.0 appears to be very
unsuitable choice.

  I'm pretty sure my code is fairly well debugged now, but of course
there may be still bugs lurking; when I have put my bots on CGOS for the
first time it was awfully bug-ridden (and about 800 ELO worse ;-). What
ELO rating did pure UCT bots get historically with how many playouts?


P.S.: Looks like the heavy playouts I described in my other mail bring
no improvement to the bot strength at all, and mostly make it few ELO
weaker. :-( I'm rethinking my approaches now.

-- 
Petr "Pasky" Baudis
Whatever you can do, or dream you can, begin it.
Boldness has genius, power, and magic in it.-- J. W. von Goethe
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Optimal explore rates for plain UCT

2008-03-10 Thread Christoph Birk

On Mon, 10 Mar 2008, Petr Baudis wrote:

 With 110k playouts per move and no domain knowledge in the playouts,
the ratings are now:

c=0.2  (pachi1-p0.2-light)  ELO 1627 (285 games)
c=1.0  (pachi1-p1.0-light)  ELO 1590 (120 games)
c=0.05 (pachi1-p0.05-light) ELO 1531 (286 games)
c=2.0  (pachi1-p2.0-light)  ELO 1511 (118 games)


I have two "light" UCT bots on CGOS:
Name  #playouts c (*) CGOS-ELO
myCtest-V-00035 0.25  1508
myCtest-10k-UCT   1 0.25  1246

(*): I use c=0.5 outside the sqrt()

What is your 'create-new-node' threshold? I use 50.

Christoph

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Optimal explore rates for plain UCT

2008-03-10 Thread Don Dailey


Petr Baudis wrote:
>   Hi,
>
> On Sat, Mar 08, 2008 at 10:18:34AM +0100, Petr Baudis wrote:
>   
>>   (By the way, pachi1-*-light are UCT bots with completely light
>> playouts with various UCB1 c values, if anyone wants to use that as
>> reference. Surprisingly, it seems that my heavy playouts do not make big
>> difference so far, though the rating is still very unstable.)
>> 
>
>   after two days of play, it seems the ratings are fairly settled now.
> For clarity, here is the UCB1 formula I use:
>
>   UCB1 = X_i + sqrt(log(N) * c / n)
>
> Specifically, the c is withing the sqrt(); some of the papers put it in
> front of the sqrt.
>
>   Also, I expand UCT leaves at the second hit. This retains conservative
> memory usage but it is important for strength - I saw huge strength
> increase when I lowered this to 2 from the original value of 5.
>
>   With 110k playouts per move and no domain knowledge in the playouts,
> the ratings are now:
>
>   c=0.2  (pachi1-p0.2-light)  ELO 1627 (285 games)
>   c=1.0  (pachi1-p1.0-light)  ELO 1590 (120 games)
>   c=0.05 (pachi1-p0.05-light) ELO 1531 (286 games)
>   c=2.0  (pachi1-p2.0-light)  ELO 1511 (118 games)
>
>   The main two messages of this post are: If you are developing own UCT
> bot, with this number of playouts you should be aiming at least at 1600
> ELO on CGOS. And choosing the right c can easily make a 100 ELO
> difference! In particular, the "default" UCB1 c=2.0 appears to be very
> unsuitable choice.
>   
I think you may still have a bug.  You should get well over 1700 with
110,000 playouts, even if they are light playouts.  


>   I'm pretty sure my code is fairly well debugged now, but of course
> there may be still bugs lurking; when I have put my bots on CGOS for the
> first time it was awfully bug-ridden (and about 800 ELO worse ;-). What
> ELO rating did pure UCT bots get historically with how many playouts?
>   
FatMan does 20k playouts and has heavy play-outs, very similar to the
first paper where mogo described it's play-out strategy - basically
playing a random out of atari move or a local move that fits one of
their patterns.   It is rated 1800 on CGOS.The tree expansion policy
for nodes is based on the parent count,   not the child itself.So
once the parent has 100 play-outs children are expanded regardless of
the number of games they have seen.(Near the end of the game in 9x9
go some moves could be tried a few times before being expanded.) 

None of the other things are done in FatMan that many of the modern
programs are doing.I know that an older versions of Lazarus was
playing over 1700 with light play-outs and a formula I made up (which
doesn't works as well as the ucb stuff.)It was doing about 100k
playouts at the most. 

I'll bet you have some crazy bug.  

- Don




>
> P.S.: Looks like the heavy playouts I described in my other mail bring
> no improvement to the bot strength at all, and mostly make it few ELO
> weaker. :-( I'm rethinking my approaches now.
>
>   
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Optimal explore rates for plain UCT

2008-03-10 Thread Petr Baudis
On Mon, Mar 10, 2008 at 03:40:53PM -0700, Christoph Birk wrote:
> On Mon, 10 Mar 2008, Petr Baudis wrote:
>>  With 110k playouts per move and no domain knowledge in the playouts,
>> the ratings are now:
>>
>>  c=0.2  (pachi1-p0.2-light)  ELO 1627 (285 games)
>>  c=1.0  (pachi1-p1.0-light)  ELO 1590 (120 games)
>>  c=0.05 (pachi1-p0.05-light) ELO 1531 (286 games)
>>  c=2.0  (pachi1-p2.0-light)  ELO 1511 (118 games)
>
> I have two "light" UCT bots on CGOS:
> Name  #playouts c (*) CGOS-ELO
> myCtest-V-00035 0.25  1508
> myCtest-10k-UCT   1 0.25  1246
>
> (*): I use c=0.5 outside the sqrt()
>
> What is your 'create-new-node' threshold? I use 50.

I actually added that to my mail at the last minute: "Also, I expand UCT
leaves at the second hit. This retains conservative memory usage but it
is important for strength - I saw huge strength increase when I lowered
this to 2 from the original value of 5."

Even with just threshold 2, for 110k playouts I need only about 20M of
memory for the tree, so I'm actually wondering about how to improve my
program by spending more memory usefully. ;-)

-- 
Petr "Pasky" Baudis
Whatever you can do, or dream you can, begin it.
Boldness has genius, power, and magic in it.-- J. W. von Goethe
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Optimal explore rates for plain UCT

2008-03-10 Thread Petr Baudis
On Mon, Mar 10, 2008 at 06:57:07PM -0400, Don Dailey wrote:
> I think you may still have a bug.  You should get well over 1700 with
> 110,000 playouts, even if they are light playouts.  

Hmmm... That is going to be some tough debugging I suspect.

> >   I'm pretty sure my code is fairly well debugged now, but of course
> > there may be still bugs lurking; when I have put my bots on CGOS for the
> > first time it was awfully bug-ridden (and about 800 ELO worse ;-). What
> > ELO rating did pure UCT bots get historically with how many playouts?
> >   
> FatMan does 20k playouts and has heavy play-outs, very similar to the
> first paper where mogo described it's play-out strategy - basically
> playing a random out of atari move or a local move that fits one of
> their patterns.   It is rated 1800 on CGOS.The tree expansion policy
> for nodes is based on the parent count,   not the child itself.So
> once the parent has 100 play-outs children are expanded regardless of
> the number of games they have seen.(Near the end of the game in 9x9
> go some moves could be tried a few times before being expanded.) 

Oh, interesting! I must have misread libEGO code which seems to use
similar thresholds.

What is the justification of using the parent playout count instead of
the node playout count itself?

-- 
Petr "Pasky" Baudis
Whatever you can do, or dream you can, begin it.
Boldness has genius, power, and magic in it.-- J. W. von Goethe
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Optimal explore rates for plain UCT

2008-03-10 Thread Don Dailey


Petr Baudis wrote:
> On Mon, Mar 10, 2008 at 06:57:07PM -0400, Don Dailey wrote:
>   
>> I think you may still have a bug.  You should get well over 1700 with
>> 110,000 playouts, even if they are light playouts.  
>> 
>
> Hmmm... That is going to be some tough debugging I suspect.
>   
I'm working on a new program, starting from scratch.   When it gets far
enough along I will compare my numbers (and ELO) to yours.


>   
>>>   I'm pretty sure my code is fairly well debugged now, but of course
>>> there may be still bugs lurking; when I have put my bots on CGOS for the
>>> first time it was awfully bug-ridden (and about 800 ELO worse ;-). What
>>> ELO rating did pure UCT bots get historically with how many playouts?
>>>   
>>>   
>> FatMan does 20k playouts and has heavy play-outs, very similar to the
>> first paper where mogo described it's play-out strategy - basically
>> playing a random out of atari move or a local move that fits one of
>> their patterns.   It is rated 1800 on CGOS.The tree expansion policy
>> for nodes is based on the parent count,   not the child itself.So
>> once the parent has 100 play-outs children are expanded regardless of
>> the number of games they have seen.(Near the end of the game in 9x9
>> go some moves could be tried a few times before being expanded.) 
>> 
>
> Oh, interesting! I must have misread libEGO code which seems to use
> similar thresholds.
>
> What is the justification of using the parent playout count instead of
> the node playout count itself?
>
>   
I don't know if it makes much difference how this is done, and I don't
know how everybody else is doing it.  I allocate all the children at
once and do not have to store pointers to each child,  just one pointer
to the first child and a counter so that I know how many children there
are.   On average I'm probably expanding every other time in the middle
of the game. 

I preallocate all the memory for the nodes when the program starts
instead of using malloc and free, and I assume most others are doing
something similar.

Here is my basic data structure:

typedef struct ntag
{
  intwins;
  intgames;
  intchild_count;  // zero until parent wins > 100
  struct  ntag   *children;// a pointer to a place in the big "node
pool"
  mv_t   mv;   // move that got us here.

} node_t;

When the child nodes are allocated, they are done all at once with
this code - where cc is the number of fully legal child nodes:

   // reserve space for cc entries
n->child = &(pool[pool_count]);
pool_count += cc;

overflow checking is done outside of the search routine.

There are almost certainly better schemes,   I just used what occurred
to me to be the easiest and quickest to implement without working too
hard at it.

Some programs hash each position and the tree is more abstract, no
pointers  just positions leading to other positions by zobrist hash keys
in a hash table.  

My scheme probably wastes a lot of space on nodes that are left
unvisited at the leaves of the tree.   But I don't waste much on storing
pointers since I keep them in an array. 

What is the state of the art on this?   How is everyone else doing it?


- Don








___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Optimal explore rates for plain UCT

2008-03-10 Thread Petr Baudis
  Hi,

On Mon, Mar 10, 2008 at 08:07:14PM -0400, Don Dailey wrote:
> > What is the justification of using the parent playout count instead of
> > the node playout count itself?
> >
> >   
> I don't know if it makes much difference how this is done, and I don't
> know how everybody else is doing it.  I allocate all the children at
> once and do not have to store pointers to each child,  just one pointer
> to the first child and a counter so that I know how many children there
> are.   On average I'm probably expanding every other time in the middle
> of the game. 

I really so far use just a completely unoptimized

struct tree_node {
/*
 *+--+
 *| node |
 *+--+
 *  / <- parent
 * +--+   v- sibling +--+
 * | node |  | node |
 * +--+  +--+
 *| <- children  |
 * +--+   +--+   +--+   +--+
 * | node | - | node |   | node | - | node |
 * +--+   +--+   +--+   +--+
 */
struct tree_node *parent, *sibling, *children;

coord_t coord;
int playouts; // # of playouts coming through this node
int wins; // # of wins coming through this node
float value; // wins/playouts
};

even with no memory pools, every node allocated separately. This is one
of the things I'm likely to optimize when I get to that again, but so
far my feeling is that I'm fast and small enough not to really bother.
;-) And I can be sure that there are no bugs at least in this part of
the code.

-- 
Petr "Pasky" Baudis
Whatever you can do, or dream you can, begin it.
Boldness has genius, power, and magic in it.-- J. W. von Goethe
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Optimal explore rates for plain UCT

2008-03-10 Thread Christoph Birk

On Tue, 11 Mar 2008, Petr Baudis wrote:

On Mon, Mar 10, 2008 at 06:57:07PM -0400, Don Dailey wrote:

I think you may still have a bug.  You should get well over 1700 with
110,000 playouts, even if they are light playouts.


I will run myCtest with 110k-playout, c=0.25 and node creation
after the 2nd visit ... let's see what its ELO rating will be
in a couple of days.

Christoph
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Optimal explore rates for plain UCT

2008-03-10 Thread Christoph Birk

On Mon, 10 Mar 2008, Christoph Birk wrote:

On Tue, 11 Mar 2008, Petr Baudis wrote:

On Mon, Mar 10, 2008 at 06:57:07PM -0400, Don Dailey wrote:

I think you may still have a bug.  You should get well over 1700 with
110,000 playouts, even if they are light playouts.


I will run myCtest with 110k-playout, c=0.25 and node creation
after the 2nd visit ... let's see what its ELO rating will be
in a couple of days.


Sorry, I just realized I cannot do 110k playouts because my
implementation is too slow.
I suggest you run a 'pachi-0.25-light-50k' that just uses
5 playouts. That way you can compare it to 'myCtest-V-0003'.

BTW: I count the new-node threshold like Don from the parent
 node, so 50 not far from your '2'.

Christoph
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Optimal explore rates for plain UCT

2008-03-10 Thread Petr Baudis
On Mon, Mar 10, 2008 at 05:36:14PM -0700, Christoph Birk wrote:
> On Mon, 10 Mar 2008, Christoph Birk wrote:
>> On Tue, 11 Mar 2008, Petr Baudis wrote:
>>> On Mon, Mar 10, 2008 at 06:57:07PM -0400, Don Dailey wrote:
 I think you may still have a bug.  You should get well over 1700 with
 110,000 playouts, even if they are light playouts.
>>
>> I will run myCtest with 110k-playout, c=0.25 and node creation
>> after the 2nd visit ... let's see what its ELO rating will be
>> in a couple of days.
>
> Sorry, I just realized I cannot do 110k playouts because my
> implementation is too slow.
> I suggest you run a 'pachi-0.25-light-50k' that just uses
> 5 playouts. That way you can compare it to 'myCtest-V-0003'.

Sure, it now won its first game as pachi1-p0.25-li50k. It will be
interesting how much difference number of playouts makes for me too.

> BTW: I count the new-node threshold like Don from the parent
>  node, so 50 not far from your '2'.

Hmm, I really wonder where this comes from, the idea seems quite
unnatural to me.

-- 
Petr "Pasky" Baudis
Whatever you can do, or dream you can, begin it.
Boldness has genius, power, and magic in it.-- J. W. von Goethe
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Optimal explore rates for plain UCT

2008-03-10 Thread Christoph Birk

On Tue, 11 Mar 2008, Petr Baudis wrote:

BTW: I count the new-node threshold like Don from the parent
 node, so 50 not far from your '2'.


Hmm, I really wonder where this comes from, the idea seems quite
unnatural to me.


Well, child-nodes are created by the parent. The parent keeps
a count of how many times it (the parent) has been visited, and
holds off creating the children until a threshold is exceeded.

Like Don, I create all children of a node at once, as an
array, so no pointers to individual children is necessary,
just a pointer to the first sibling.

Seem very natural to me :-)

Christoph
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Optimal explore rates for plain UCT

2008-03-10 Thread Don Dailey
The key point, good or bad, is that I allocate all the legal children
nodes as a group.I manage my own memory pool and just hand out nodes
sequentially.I don't need to maintain pointers to each child, only a
single pointer to the first child.   So instead of needing 4 extra bytes
per node for pointers to the data,  I need something that amortizes to
something near zero.  

Your method is to allocate 1 node when it's been visited once or twice -
very natural I agree.   My method is to allocate all the children at
once, and wait until the parent has been visited some number of times
(currently 100).   If there are 50 legal moves, that gives on average
about 1 node allocation every 2 visits, which is what you said you do.  

Of course my allocation is not exact like yours.  You expand exactly
once every 2 visits,  but mine is just an average,   some nodes may get
expanded after only 1 visit  and others may get visited several times
before getting expanded.  

I don't really know if any of this matters much.

- Don




Petr Baudis wrote:
> On Mon, Mar 10, 2008 at 05:36:14PM -0700, Christoph Birk wrote:
>   
>> On Mon, 10 Mar 2008, Christoph Birk wrote:
>> 
>>> On Tue, 11 Mar 2008, Petr Baudis wrote:
>>>   
 On Mon, Mar 10, 2008 at 06:57:07PM -0400, Don Dailey wrote:
 
> I think you may still have a bug.  You should get well over 1700 with
> 110,000 playouts, even if they are light playouts.
>   
>>> I will run myCtest with 110k-playout, c=0.25 and node creation
>>> after the 2nd visit ... let's see what its ELO rating will be
>>> in a couple of days.
>>>   
>> Sorry, I just realized I cannot do 110k playouts because my
>> implementation is too slow.
>> I suggest you run a 'pachi-0.25-light-50k' that just uses
>> 5 playouts. That way you can compare it to 'myCtest-V-0003'.
>> 
>
> Sure, it now won its first game as pachi1-p0.25-li50k. It will be
> interesting how much difference number of playouts makes for me too.
>
>   
>> BTW: I count the new-node threshold like Don from the parent
>>  node, so 50 not far from your '2'.
>> 
>
> Hmm, I really wonder where this comes from, the idea seems quite
> unnatural to me.
>
>   
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Optimal explore rates for plain UCT

2008-03-10 Thread Jason House

On Mar 10, 2008, at 8:07 PM, Don Dailey <[EMAIL PROTECTED]> wrote:


Some programs hash each position and the tree is more abstract, no  
pointers  just positions leading to other positions by zobrist hash  
keys in a hash table.
My scheme probably wastes a lot of space on nodes that are left  
unvisited at the leaves of the tree.   But I don't waste much on  
storing pointers since I keep them in an array.


What is the state of the art on this?   How is everyone else doing it?


Before creating a child, I calculate hash keys like I would for  
situational super ko. If there's a hit in the transposition table, I  
reuse the pointer. If not, I create and add. It's simple to implement  
and should significantly reduce searches.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/