Finally got a window of time. It's called the "epsilon trick." Here is a cut and paste of Lukasz' explanation from the archives, including how to calculate N. - Dave Hillis
-------------------------------------- -----Original Message----- From: [EMAIL PROTECTED] To: computer-go@computer-go.org Sent: Sun, 14 Jan 2007 4:50 AM Subject: [computer-go] Fast UCT (epsilon trick) Hi I've looked back into the article and I see that it is not trivial to understand how does the epsilon trick (ET) works. Therefore I will describe it here on the example of UCT. The bottleneck of UCT algorithm is choosing a move. Program has to iterate over all children evaluating their UCB value. We want to use of this loop rarely. One way to do it is to group playouts together. I.e. play 2 or more MC simulations per each tree traversal. But this is not good enough as it doesn't distribute the playouts over the tree and some parts will not get enough of them. The ET works as follows: With each node of the tree program keeps additional information: - child_to_visit - visits_to_go When we visit node, if node.visits_to_go > 0 then node->visits_to_go <- node.visits_to_go - 1 node = child_to_visit So we have predetermined strategy to go to particular child visits_to_go times. But if node.visis_to_go = 0 then // we have to set the strategy first node->child_to_visit = UCT_find_child (node) node->visits_to_go = Epsilon * node->visited_count // round up As we can see, when Epsilon is small or when a particular node is visited not too many times, UCT-ET is identical to UCT. But when node was visited more times (near the root, far from leaves), then UCT-ET may save some time and go to precalculated child. That's it. I hope You like it :) Łukasz Lew PS. In Df-PN, good Epsilon was 1/4 I believe that here it will be much smaller. 1/64 maybe. Or even one may change the Epsilon equation to: node->visits_to_go = Epsilon * sqrt(node->visited_count) If someone will do some experiments, I'm eager to hear the results. PS2 float InvSqrt (float x) { float xhalf = 0.5f * x; int i = *(int*)&x; i = 0x5f3759df - (i>>1); x = *(float*)&i; x = x * (1.5f - xhalf * x * x); return x; } ;-) _______________________________________________ -------------------------------------- -----Original Message----- From: [EMAIL PROTECTED] To: computer-go@computer-go.org Sent: Wed, 3 Dec 2008 12:56 am Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot Hmm.. it could be that N is picked randomly each time... now I can't seem to find the description and my memory is not serving me well here. Perhaps someone else remembers? I'll try to track it down. - Dave Hillis -----Original Message----- From: Michael Williams <[EMAIL PROTECTED]> To: computer-go <computer-go@computer-go.org> Sent: Wed, 3 Dec 2008 12:14 am Subject: Re: [computer-go] Monte-Carlo Tree Search reference b ot That's going to repeat the same exact path through the tree three times, isn't it? If so, it seems like it would be more efficient to do N playouts from the leaf after each traversal of the tree. [EMAIL PROTECTED] wrote: > There is another speedup trick that might interest you. IIRC Lukasz Lew > > came up with it, but I forget what he called it. After calculating the > > selected move for an internal node (going through UCT and RAVE and > whatnot) > store that move. Then, for the next N visits to that node > (where N is 5 or > 10 or so), just repeat that move without having to > calculate what the move > would be. > > - Dave Hillis > > -----Original Message----- > From: Mark Boon <[EMAIL PROTECTED]> > To: computer-go <computer-go@computer-go.org> > Sent: Tue, 2 Dec 2008 11:17 pm > Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot > > I have made some minor performance improvements and this is as far as I > > > intend to take this particular project. I might make some small changes > > > if necessary, but most likely I'll leave this largely unchanged from now. > > > > I had set myself as an arbitrary goal that it should do at least 20K > > > playouts. But with real liberties, AMAF and a RAVE formula I got stuck > in > > the 16K-17K range. According to my profiler that is mostly due to the > > > expensive formula used to compare nodes, where it says it spends 25% of > > > total time. The formula I'm using is: > > beta * (virtu al-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT) > > beta = 1 - log(parent-visits) / 20 > UCT = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) ) > RAVE = exploration-factor *sqrt( log(parent-visits) / > (nr-virtual-visits+1) ) > > There are probably quite a few possibilities still to tune this program > with regards to playing streng th and performance. But I felt it doesn't > help to obscure the implementation by too many details. > > The implementation of the search algorithm was entirely > game-independent, until I introduced AMAF. I didn't see how to fix that, > as there's no way getting around that it's based on the fact that a move > is represented by a single coordinate, which is mapped to an array to > store the statistical values. But strip the AMAF part, which is a block > of 30 odd lines of code, and I think it can be used for other games > basically as-is. I did this not because I ever see myself program > another game, but because in my experience in doing so I get a cleaner > separation between modules. > > At 2,000 playouts, it's still quite a bit weaker than the plain MC-AMAF > refbot. It wins only about 33%. But that's probably because the > 1,000-2,000 playouts range is the sweet-spot for that particular type of > playing engine. It doesn' t scale from there, whereas the MCTS ref-bot > only just gets war med up with 2,000 playouts. > > This leads me to a question. I suppose it might be of some interest to > put this bot up on CGOS . But what parameters to use? The main one being > the number of playouts, naturally. > > Mark > _______________________________________________ > computer-go mailing list > computer-go@computer-go.org <mailto:computer-go@computer-go.org> > http://www.computer-go.org/mailman/listinfo/computer-go/ > > ------------------------------------------------------------------------ > Tis the season to save your money! Get the new AOL Holiday Toolbar > > <http://toolbar.aol.com/holiday/download.html?ncid=emlweusdown00000008> > for > money saving offers and gift ideas. > > > ------------------------------------------------------------------------ > > _______________________________________________ > computer-go mailing list > [EMAIL PROTECTED] > http://www.computer-go.org/mailman/listinfo/computer-go/ _______________________________________________ computer-go mailing list [EMAIL PROTECTED] http://www.computer-go.org/mailman/listinfo/computer-go/ Tis the season to save your money! Get the new AOL Holiday Toolbar for money saving offers and gift ideas. _______________________________________________ omputer-go mailing list [EMAIL PROTECTED] ttp://www.computer-go.org/mailman/listinfo/computer-go/ _______________________________________________ omputer-go mailing list [EMAIL PROTECTED] ttp://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/