Re: endgame (Was [computer-go] Re: Should 9x9 komi be 8.0 ?])
On Mon, Mar 10, 2008 at 02:33:03AM -0400, Michael Williams wrote: > Jonas Kahn wrote: >> out, kos can go on for long. I don't know what depth is attained in the >> tree (by the way, I would really like to know), but I doubt it is that > > MoGo displays the depth of the principle variation in the stderr stream. I have been wondering, does that include _any_ nodes, or only these above certain number of playouts? What is the playout threshold? -- Petr "Pasky" Baudis Whatever you can do, or dream you can, begin it. Boldness has genius, power, and magic in it.-- J. W. von Goethe ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: endgame (Was [computer-go] Re: Should 9x9 komi be 8.0 ?])
On Mon, 10 Mar 2008, Petr Baudis wrote: MoGo displays the depth of the principle variation in the stderr stream. I have been wondering, does that include _any_ nodes, or only these above certain number of playouts? What is the playout threshold? The 'principal variation' is usually the one that the program would play against itself; at each level the one move with the highest score with might (depending on the program) just be the one with the most playouts. Christoph ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: endgame (Was [computer-go] Re: Should 9x9 komi be 8.0 ?])
On Mon, Mar 10, 2008 at 01:03:02PM -0700, Christoph Birk wrote: > On Mon, 10 Mar 2008, Petr Baudis wrote: >>> MoGo displays the depth of the principle variation in the stderr stream. >> >> I have been wondering, does that include _any_ nodes, or only these >> above certain number of playouts? What is the playout threshold? > > The 'principal variation' is usually the one that the program would > play against itself; at each level the one move with the highest > score with might (depending on the program) just be the one with > the most playouts. I guess Petr meant: does Mogo stop what it calls principal variation when all nodes following the move have been visited less than say 100 or 1000 times, or does it follow the ``best'' moves until the very leaves of the UCT tree ? Jonas ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Optimal explore rates for plain UCT
Hi, On Sat, Mar 08, 2008 at 10:18:34AM +0100, Petr Baudis wrote: > (By the way, pachi1-*-light are UCT bots with completely light > playouts with various UCB1 c values, if anyone wants to use that as > reference. Surprisingly, it seems that my heavy playouts do not make big > difference so far, though the rating is still very unstable.) after two days of play, it seems the ratings are fairly settled now. For clarity, here is the UCB1 formula I use: UCB1 = X_i + sqrt(log(N) * c / n) Specifically, the c is withing the sqrt(); some of the papers put it in front of the sqrt. Also, I expand UCT leaves at the second hit. This retains conservative memory usage but it is important for strength - I saw huge strength increase when I lowered this to 2 from the original value of 5. With 110k playouts per move and no domain knowledge in the playouts, the ratings are now: c=0.2 (pachi1-p0.2-light) ELO 1627 (285 games) c=1.0 (pachi1-p1.0-light) ELO 1590 (120 games) c=0.05 (pachi1-p0.05-light) ELO 1531 (286 games) c=2.0 (pachi1-p2.0-light) ELO 1511 (118 games) The main two messages of this post are: If you are developing own UCT bot, with this number of playouts you should be aiming at least at 1600 ELO on CGOS. And choosing the right c can easily make a 100 ELO difference! In particular, the "default" UCB1 c=2.0 appears to be very unsuitable choice. I'm pretty sure my code is fairly well debugged now, but of course there may be still bugs lurking; when I have put my bots on CGOS for the first time it was awfully bug-ridden (and about 800 ELO worse ;-). What ELO rating did pure UCT bots get historically with how many playouts? P.S.: Looks like the heavy playouts I described in my other mail bring no improvement to the bot strength at all, and mostly make it few ELO weaker. :-( I'm rethinking my approaches now. -- Petr "Pasky" Baudis Whatever you can do, or dream you can, begin it. Boldness has genius, power, and magic in it.-- J. W. von Goethe ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Optimal explore rates for plain UCT
On Mon, 10 Mar 2008, Petr Baudis wrote: With 110k playouts per move and no domain knowledge in the playouts, the ratings are now: c=0.2 (pachi1-p0.2-light) ELO 1627 (285 games) c=1.0 (pachi1-p1.0-light) ELO 1590 (120 games) c=0.05 (pachi1-p0.05-light) ELO 1531 (286 games) c=2.0 (pachi1-p2.0-light) ELO 1511 (118 games) I have two "light" UCT bots on CGOS: Name #playouts c (*) CGOS-ELO myCtest-V-00035 0.25 1508 myCtest-10k-UCT 1 0.25 1246 (*): I use c=0.5 outside the sqrt() What is your 'create-new-node' threshold? I use 50. Christoph ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Optimal explore rates for plain UCT
Petr Baudis wrote: > Hi, > > On Sat, Mar 08, 2008 at 10:18:34AM +0100, Petr Baudis wrote: > >> (By the way, pachi1-*-light are UCT bots with completely light >> playouts with various UCB1 c values, if anyone wants to use that as >> reference. Surprisingly, it seems that my heavy playouts do not make big >> difference so far, though the rating is still very unstable.) >> > > after two days of play, it seems the ratings are fairly settled now. > For clarity, here is the UCB1 formula I use: > > UCB1 = X_i + sqrt(log(N) * c / n) > > Specifically, the c is withing the sqrt(); some of the papers put it in > front of the sqrt. > > Also, I expand UCT leaves at the second hit. This retains conservative > memory usage but it is important for strength - I saw huge strength > increase when I lowered this to 2 from the original value of 5. > > With 110k playouts per move and no domain knowledge in the playouts, > the ratings are now: > > c=0.2 (pachi1-p0.2-light) ELO 1627 (285 games) > c=1.0 (pachi1-p1.0-light) ELO 1590 (120 games) > c=0.05 (pachi1-p0.05-light) ELO 1531 (286 games) > c=2.0 (pachi1-p2.0-light) ELO 1511 (118 games) > > The main two messages of this post are: If you are developing own UCT > bot, with this number of playouts you should be aiming at least at 1600 > ELO on CGOS. And choosing the right c can easily make a 100 ELO > difference! In particular, the "default" UCB1 c=2.0 appears to be very > unsuitable choice. > I think you may still have a bug. You should get well over 1700 with 110,000 playouts, even if they are light playouts. > I'm pretty sure my code is fairly well debugged now, but of course > there may be still bugs lurking; when I have put my bots on CGOS for the > first time it was awfully bug-ridden (and about 800 ELO worse ;-). What > ELO rating did pure UCT bots get historically with how many playouts? > FatMan does 20k playouts and has heavy play-outs, very similar to the first paper where mogo described it's play-out strategy - basically playing a random out of atari move or a local move that fits one of their patterns. It is rated 1800 on CGOS.The tree expansion policy for nodes is based on the parent count, not the child itself.So once the parent has 100 play-outs children are expanded regardless of the number of games they have seen.(Near the end of the game in 9x9 go some moves could be tried a few times before being expanded.) None of the other things are done in FatMan that many of the modern programs are doing.I know that an older versions of Lazarus was playing over 1700 with light play-outs and a formula I made up (which doesn't works as well as the ucb stuff.)It was doing about 100k playouts at the most. I'll bet you have some crazy bug. - Don > > P.S.: Looks like the heavy playouts I described in my other mail bring > no improvement to the bot strength at all, and mostly make it few ELO > weaker. :-( I'm rethinking my approaches now. > > ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Optimal explore rates for plain UCT
On Mon, Mar 10, 2008 at 03:40:53PM -0700, Christoph Birk wrote: > On Mon, 10 Mar 2008, Petr Baudis wrote: >> With 110k playouts per move and no domain knowledge in the playouts, >> the ratings are now: >> >> c=0.2 (pachi1-p0.2-light) ELO 1627 (285 games) >> c=1.0 (pachi1-p1.0-light) ELO 1590 (120 games) >> c=0.05 (pachi1-p0.05-light) ELO 1531 (286 games) >> c=2.0 (pachi1-p2.0-light) ELO 1511 (118 games) > > I have two "light" UCT bots on CGOS: > Name #playouts c (*) CGOS-ELO > myCtest-V-00035 0.25 1508 > myCtest-10k-UCT 1 0.25 1246 > > (*): I use c=0.5 outside the sqrt() > > What is your 'create-new-node' threshold? I use 50. I actually added that to my mail at the last minute: "Also, I expand UCT leaves at the second hit. This retains conservative memory usage but it is important for strength - I saw huge strength increase when I lowered this to 2 from the original value of 5." Even with just threshold 2, for 110k playouts I need only about 20M of memory for the tree, so I'm actually wondering about how to improve my program by spending more memory usefully. ;-) -- Petr "Pasky" Baudis Whatever you can do, or dream you can, begin it. Boldness has genius, power, and magic in it.-- J. W. von Goethe ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Optimal explore rates for plain UCT
On Mon, Mar 10, 2008 at 06:57:07PM -0400, Don Dailey wrote: > I think you may still have a bug. You should get well over 1700 with > 110,000 playouts, even if they are light playouts. Hmmm... That is going to be some tough debugging I suspect. > > I'm pretty sure my code is fairly well debugged now, but of course > > there may be still bugs lurking; when I have put my bots on CGOS for the > > first time it was awfully bug-ridden (and about 800 ELO worse ;-). What > > ELO rating did pure UCT bots get historically with how many playouts? > > > FatMan does 20k playouts and has heavy play-outs, very similar to the > first paper where mogo described it's play-out strategy - basically > playing a random out of atari move or a local move that fits one of > their patterns. It is rated 1800 on CGOS.The tree expansion policy > for nodes is based on the parent count, not the child itself.So > once the parent has 100 play-outs children are expanded regardless of > the number of games they have seen.(Near the end of the game in 9x9 > go some moves could be tried a few times before being expanded.) Oh, interesting! I must have misread libEGO code which seems to use similar thresholds. What is the justification of using the parent playout count instead of the node playout count itself? -- Petr "Pasky" Baudis Whatever you can do, or dream you can, begin it. Boldness has genius, power, and magic in it.-- J. W. von Goethe ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Optimal explore rates for plain UCT
Petr Baudis wrote: > On Mon, Mar 10, 2008 at 06:57:07PM -0400, Don Dailey wrote: > >> I think you may still have a bug. You should get well over 1700 with >> 110,000 playouts, even if they are light playouts. >> > > Hmmm... That is going to be some tough debugging I suspect. > I'm working on a new program, starting from scratch. When it gets far enough along I will compare my numbers (and ELO) to yours. > >>> I'm pretty sure my code is fairly well debugged now, but of course >>> there may be still bugs lurking; when I have put my bots on CGOS for the >>> first time it was awfully bug-ridden (and about 800 ELO worse ;-). What >>> ELO rating did pure UCT bots get historically with how many playouts? >>> >>> >> FatMan does 20k playouts and has heavy play-outs, very similar to the >> first paper where mogo described it's play-out strategy - basically >> playing a random out of atari move or a local move that fits one of >> their patterns. It is rated 1800 on CGOS.The tree expansion policy >> for nodes is based on the parent count, not the child itself.So >> once the parent has 100 play-outs children are expanded regardless of >> the number of games they have seen.(Near the end of the game in 9x9 >> go some moves could be tried a few times before being expanded.) >> > > Oh, interesting! I must have misread libEGO code which seems to use > similar thresholds. > > What is the justification of using the parent playout count instead of > the node playout count itself? > > I don't know if it makes much difference how this is done, and I don't know how everybody else is doing it. I allocate all the children at once and do not have to store pointers to each child, just one pointer to the first child and a counter so that I know how many children there are. On average I'm probably expanding every other time in the middle of the game. I preallocate all the memory for the nodes when the program starts instead of using malloc and free, and I assume most others are doing something similar. Here is my basic data structure: typedef struct ntag { intwins; intgames; intchild_count; // zero until parent wins > 100 struct ntag *children;// a pointer to a place in the big "node pool" mv_t mv; // move that got us here. } node_t; When the child nodes are allocated, they are done all at once with this code - where cc is the number of fully legal child nodes: // reserve space for cc entries n->child = &(pool[pool_count]); pool_count += cc; overflow checking is done outside of the search routine. There are almost certainly better schemes, I just used what occurred to me to be the easiest and quickest to implement without working too hard at it. Some programs hash each position and the tree is more abstract, no pointers just positions leading to other positions by zobrist hash keys in a hash table. My scheme probably wastes a lot of space on nodes that are left unvisited at the leaves of the tree. But I don't waste much on storing pointers since I keep them in an array. What is the state of the art on this? How is everyone else doing it? - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Optimal explore rates for plain UCT
Hi, On Mon, Mar 10, 2008 at 08:07:14PM -0400, Don Dailey wrote: > > What is the justification of using the parent playout count instead of > > the node playout count itself? > > > > > I don't know if it makes much difference how this is done, and I don't > know how everybody else is doing it. I allocate all the children at > once and do not have to store pointers to each child, just one pointer > to the first child and a counter so that I know how many children there > are. On average I'm probably expanding every other time in the middle > of the game. I really so far use just a completely unoptimized struct tree_node { /* *+--+ *| node | *+--+ * / <- parent * +--+ v- sibling +--+ * | node | | node | * +--+ +--+ *| <- children | * +--+ +--+ +--+ +--+ * | node | - | node | | node | - | node | * +--+ +--+ +--+ +--+ */ struct tree_node *parent, *sibling, *children; coord_t coord; int playouts; // # of playouts coming through this node int wins; // # of wins coming through this node float value; // wins/playouts }; even with no memory pools, every node allocated separately. This is one of the things I'm likely to optimize when I get to that again, but so far my feeling is that I'm fast and small enough not to really bother. ;-) And I can be sure that there are no bugs at least in this part of the code. -- Petr "Pasky" Baudis Whatever you can do, or dream you can, begin it. Boldness has genius, power, and magic in it.-- J. W. von Goethe ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Optimal explore rates for plain UCT
On Tue, 11 Mar 2008, Petr Baudis wrote: On Mon, Mar 10, 2008 at 06:57:07PM -0400, Don Dailey wrote: I think you may still have a bug. You should get well over 1700 with 110,000 playouts, even if they are light playouts. I will run myCtest with 110k-playout, c=0.25 and node creation after the 2nd visit ... let's see what its ELO rating will be in a couple of days. Christoph ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Optimal explore rates for plain UCT
On Mon, 10 Mar 2008, Christoph Birk wrote: On Tue, 11 Mar 2008, Petr Baudis wrote: On Mon, Mar 10, 2008 at 06:57:07PM -0400, Don Dailey wrote: I think you may still have a bug. You should get well over 1700 with 110,000 playouts, even if they are light playouts. I will run myCtest with 110k-playout, c=0.25 and node creation after the 2nd visit ... let's see what its ELO rating will be in a couple of days. Sorry, I just realized I cannot do 110k playouts because my implementation is too slow. I suggest you run a 'pachi-0.25-light-50k' that just uses 5 playouts. That way you can compare it to 'myCtest-V-0003'. BTW: I count the new-node threshold like Don from the parent node, so 50 not far from your '2'. Christoph ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Optimal explore rates for plain UCT
On Mon, Mar 10, 2008 at 05:36:14PM -0700, Christoph Birk wrote: > On Mon, 10 Mar 2008, Christoph Birk wrote: >> On Tue, 11 Mar 2008, Petr Baudis wrote: >>> On Mon, Mar 10, 2008 at 06:57:07PM -0400, Don Dailey wrote: I think you may still have a bug. You should get well over 1700 with 110,000 playouts, even if they are light playouts. >> >> I will run myCtest with 110k-playout, c=0.25 and node creation >> after the 2nd visit ... let's see what its ELO rating will be >> in a couple of days. > > Sorry, I just realized I cannot do 110k playouts because my > implementation is too slow. > I suggest you run a 'pachi-0.25-light-50k' that just uses > 5 playouts. That way you can compare it to 'myCtest-V-0003'. Sure, it now won its first game as pachi1-p0.25-li50k. It will be interesting how much difference number of playouts makes for me too. > BTW: I count the new-node threshold like Don from the parent > node, so 50 not far from your '2'. Hmm, I really wonder where this comes from, the idea seems quite unnatural to me. -- Petr "Pasky" Baudis Whatever you can do, or dream you can, begin it. Boldness has genius, power, and magic in it.-- J. W. von Goethe ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Optimal explore rates for plain UCT
On Tue, 11 Mar 2008, Petr Baudis wrote: BTW: I count the new-node threshold like Don from the parent node, so 50 not far from your '2'. Hmm, I really wonder where this comes from, the idea seems quite unnatural to me. Well, child-nodes are created by the parent. The parent keeps a count of how many times it (the parent) has been visited, and holds off creating the children until a threshold is exceeded. Like Don, I create all children of a node at once, as an array, so no pointers to individual children is necessary, just a pointer to the first sibling. Seem very natural to me :-) Christoph ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Optimal explore rates for plain UCT
The key point, good or bad, is that I allocate all the legal children nodes as a group.I manage my own memory pool and just hand out nodes sequentially.I don't need to maintain pointers to each child, only a single pointer to the first child. So instead of needing 4 extra bytes per node for pointers to the data, I need something that amortizes to something near zero. Your method is to allocate 1 node when it's been visited once or twice - very natural I agree. My method is to allocate all the children at once, and wait until the parent has been visited some number of times (currently 100). If there are 50 legal moves, that gives on average about 1 node allocation every 2 visits, which is what you said you do. Of course my allocation is not exact like yours. You expand exactly once every 2 visits, but mine is just an average, some nodes may get expanded after only 1 visit and others may get visited several times before getting expanded. I don't really know if any of this matters much. - Don Petr Baudis wrote: > On Mon, Mar 10, 2008 at 05:36:14PM -0700, Christoph Birk wrote: > >> On Mon, 10 Mar 2008, Christoph Birk wrote: >> >>> On Tue, 11 Mar 2008, Petr Baudis wrote: >>> On Mon, Mar 10, 2008 at 06:57:07PM -0400, Don Dailey wrote: > I think you may still have a bug. You should get well over 1700 with > 110,000 playouts, even if they are light playouts. > >>> I will run myCtest with 110k-playout, c=0.25 and node creation >>> after the 2nd visit ... let's see what its ELO rating will be >>> in a couple of days. >>> >> Sorry, I just realized I cannot do 110k playouts because my >> implementation is too slow. >> I suggest you run a 'pachi-0.25-light-50k' that just uses >> 5 playouts. That way you can compare it to 'myCtest-V-0003'. >> > > Sure, it now won its first game as pachi1-p0.25-li50k. It will be > interesting how much difference number of playouts makes for me too. > > >> BTW: I count the new-node threshold like Don from the parent >> node, so 50 not far from your '2'. >> > > Hmm, I really wonder where this comes from, the idea seems quite > unnatural to me. > > ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Optimal explore rates for plain UCT
On Mar 10, 2008, at 8:07 PM, Don Dailey <[EMAIL PROTECTED]> wrote: Some programs hash each position and the tree is more abstract, no pointers just positions leading to other positions by zobrist hash keys in a hash table. My scheme probably wastes a lot of space on nodes that are left unvisited at the leaves of the tree. But I don't waste much on storing pointers since I keep them in an array. What is the state of the art on this? How is everyone else doing it? Before creating a child, I calculate hash keys like I would for situational super ko. If there's a hit in the transposition table, I reuse the pointer. If not, I create and add. It's simple to implement and should significantly reduce searches. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/