David Fotland wrote: > > Can you elaborate on what is in a node, and what you mean by expand? > I assume you have simple node, where each node represents a position > and the single move to get there. Then when you find a node with no > children and expand it, you allocate up to 81 new nodes, but you only > make one random playout. If uct picks a different leaf next time, you > end up with most of the leaf nodes never visited. In this case you > run out of memory quickly. If there are a few hundred K playouts to > pick a move, and 90% of the leaves have no visits, then you need over > a million nodes of memory. > > > > How do other programs handle this? I see that aya has an array of all > children in each node. This still means allocating memory for all > children when a new node is allocated. > My program is like Aya. When a new node is first visited, I create all the children nodes, which is 81 from the opening position of 9x9. To save space I don't create 81 pointers to the children, just 1 pointer to the first child and the rest are sequential in memory. Of course I also keep a counter of how many legal children there are.
- Don > > > It think many programs run several simulations through a node before > allocating the children. I can see how this saves memory, but then > how do you save the RAVE information from the early simulations? > > > > David > > > > *From:* [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] *On Behalf Of *Mark Boon > *Sent:* Tuesday, February 05, 2008 10:32 AM > *To:* computer-go > *Subject:* Re: [computer-go] More UCT / Monte-Carlo questions > > > > Thanks for the answers from both David and Magnus. > > > > I had a win-rate of 3:1 after 50 games, I thought that was significant > enough. I'll do more testing and see if this was a fluke. > > > > I expand one node at a time, I don't see what memory problems it may > cause so far. But my thinking times have been 5 sec. maximum and only > on 9x9 so I don't have that many nodes to fill a lot of memory. I'm > not going to sweat that part right now as I think the playouts will > get slower and reduce memory-usage by itself in time. > > > > With pseudo-liberties I get 20K playouts/sec. (2Ghz Intel, 9x9 board) > That compares to 30K/sec. doing just playouts and no search. It > surprised me to see the tree-traversal takes this much time but it may > also be caused by trashing the cash and benchmarking the playouts by > themselves make things look too good because it'll all fits in the cache. > > > > With tactics the playouts during search drop to 4K/sec. Of course I > don't capture stones that can't escape. I haven't spent a lot of time > optimizing yet. I'd first rather experiment a lot more. At this point > the actual speed is not so important as long as I can see relative > improvements. My framework enables me to make and test different > variations easily. In the end I'd prefer to optimize the best one > instead of half a dozen of them. > > > > What I did was to play deterministically during playout but use the > tactical information just to first select the tactical moves ahead of > the other during selection. The last part seems to give no gain at > best, which surprised me. I can see that its effect is low in nodes > that are visited often, but even those nodes are based on nodes deeper > in the tree that are not visited as often. I'll have to investigate > this further to know for sure what's going on. > > > > Mark > > > > On 5-feb-08, at 14:46, David Fotland wrote: > > > > Hi Mark, > > > > You should run a lot more test games. The 95% confidence interval on > the result is at least sqrt(1/num_games), so you need 400 or more > games to know the win rate within 5%. I’ve seen many anomalous win > rates when I used to test with 20 games. Now I use 200 games minimum, > and I try to get 500 before I make any conclusion. > > > > I think mogo is the only strong program that uses the UCB1-tuned > formula. The others use the same formula you use. I found a thesis > where they measured many different formulas and found little > difference. If any strong program other than mogo uses some formula > other than the basic one, can you please let us know? > > > > The reason to only initialize the nodes after a certain count is to > save memory. The simple uct algorithm visits each child once before > using uct to choose one. If you create all child nodes before any are > visited you end up with most of the nodes in the tree having zero visits. > > > > How many playouts per second are you getting (from the start of the > game on a 9x9 board), and on what hardware? > > > > Regards, > > > > David > > > > *From:* [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] *On Behalf Of *Mark Boon > *Sent:* Tuesday, February 05, 2008 7:54 AM > *To:* computer-go > *Subject:* [computer-go] More UCT / Monte-Carlo questions > > > > Although most of my time has been eaten up by implementing/improving > some general framework parts I did get a chance to play a bit with a > simple UCT search. Some things that I found puzzled me a bit and I > hoped someone had an explanation or similar experiences. > > > > I implemented a very basic UCT / MC program first using > pseudo-liberties. I figured this should be the base-line against which > I can test some ideas. To test if the program actually worked properly > I first let it play against Orego. The speed of my playouts are > similar to Orego so I figured the level of play should be similar. (I > switched off pondering and multiple-threading in Orego to get an > apples-to-apples comparison.) > > > > To my surprise my program seemed to be winning the majority of the > games (after a few dozen games). When looking at Orego's output I > couldn't help noticing that at the start of the game it prints much > smaller numbers of 'runs' than my program, whereas by the end of the > game the numbers are similar. This may be the reason for my program > performing better. When I looked at the code of Orego I noticed there > are two main differences: > > > > - It computes the UCT value in a completely different way. A comment > in the code refers to a paper called "Modification of UCT with > Patterns in Monte-Carlo Go". I haven't studied this yet, but whatever > it does it apparently doesn't do wonders over the standard C * sqrt( > (2*ln(N)) / (10*n) ) that I use. > > > > - It only initialises the list of untried moves in the tree after a > node had a minimum run-count of 81 (on 9x9). For the life of me I > couldn't figure out what the effect of this was or what it actually > does. I was wondering if this has an effect of what is counted as a > 'run' but I'm not sure. > > > > Then I found a paragraph (4.2) in Remi Coulomn's paper about ELO > raings in patterns. It briefly describes it as "As soon as a number of > simulations is equal to the number of points on the board, this node > is promoted to internal node, and pruning is applied." I can't help > feeling that the implementation in Orego is doing this. But I can't > figure out where it does any pruning or applying patterns of any kind. > Is there supposedly a general benefit to this even without pruning or > patterns? As stated before, at least it doesn't seem to provide any > benefit over my more primitive implementation. Maybe Peter Drake or > someone else familiar with Orego knows more about this? > > > > Anyway, reading the same paragraph mentioned above again I was struck > by another detail I thought surprising: after doing the required > number of runs, the candidates are pruned to a certain number 'n' > based on patterns. Does that mean from then on the win-ratio is > ignored? What if the by far most successful move so far does not match > any pattern? Am I misunderstanding something here? The paragraph is > very brief and does not elaborate much detail. > > > > On to my next step I introduced some very basic tactics to save stones > with one liberty, capture the opponent's stones with one liberty and > capturing the opponent's stones in a ladder. There are many possible > choices here. Just doing this near the last move and/or over the whole > board. Doing this in the simulation and/or during the selection. > > > > Just doing this near the last move during simulation caused a > slow-down of a factor 4 or 5 but improves play considerably. Also > doing this near the last move during selection doesn't affect speed > much but deteriorated play! Doing this first near the last move and > then look for tactics over the whole board as a next step affected > results negatively even more. Number of playouts are still in the same > ball-park. > > > > Thinking it over, since I don't use this to prune the selection but > just to order the candidates I could see that after many runs the > ordering suggested by the tactics get overriden by the UCT selection. > So I could see the effect of using this for selection reduced steadily > with the number of runs through a node. But still I didn't expect a > considerable reduction in strength. So what could be happening here? > > > > - I could have a bug. > > - I didn't run enough games (about 50) > > - Using knowledge to order the initial selection is counter-productive > when not accompanied with pruning. > > > > The last one I find very hard to believe. Did anyone else run into > something like this? > > > > Finally, I also looked a bit at using more threads to make use of more > than one processor. I figure this can wait and it's better to keep > things simple at this early stage but still it's something I want to > keep in mind. When looking at what I need to do to enable multiple > threads during search it seems to me I'll be required to lock > substantial parts of the UCT-tree. This means traversing the tree when > looking for the best node to expand is going to be the main > bottle-neck. Maybe not with just two to four processors, but I foresee > substantial diminishing returns after that. Is this correct? Is there > experience with many processors? Maybe a different expansion algorithm > will be required? > > > > Mark > > > > > > > > > > > > _______________________________________________ > > computer-go mailing list > > computer-go@computer-go.org <mailto:computer-go@computer-go.org> > > http://www.computer-go.org/mailman/listinfo/computer-go/ > > > > ------------------------------------------------------------------------ > > _______________________________________________ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/