Re: [computer-go] More UCT / Monte-Carlo questions

Don Dailey Tue, 05 Feb 2008 11:52:46 -0800


David Fotland wrote:
>
> Can you elaborate on what is in a node, and what you mean by expand?
>   I assume you have simple node, where each node represents a position
> and the single move to get there.  Then when you find a node with no
> children and expand it, you allocate up to 81 new nodes, but you only
> make one random playout.  If uct picks a different leaf next time, you
> end up with most of the leaf nodes never visited.  In this case you
> run out of memory quickly.  If there are a few hundred K playouts to
> pick a move, and 90% of the leaves have no visits, then you need over
> a million nodes of memory.
>
>  
>
> How do other programs handle this?  I see that aya has an array of all
> children in each node.  This still means allocating memory for all
> children when a new node is allocated.
>
My program is like Aya.    When a new node is first visited, I create
all the children nodes, which is 81 from the opening position of 9x9.   
To save space I don't create 81 pointers to the children, just 1 pointer
to the first child and the rest are sequential in memory.  Of course I
also keep a counter of how many legal children there are.


- Don



>  
>
> It think many programs run several simulations through a node before
> allocating the children.  I can see how this saves memory, but then
> how do you save the RAVE information from the early simulations?
>
>  
>
> David
>
>  
>
> *From:* [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] *On Behalf Of *Mark Boon
> *Sent:* Tuesday, February 05, 2008 10:32 AM
> *To:* computer-go
> *Subject:* Re: [computer-go] More UCT / Monte-Carlo questions
>
>  
>
> Thanks for the answers from both David and Magnus.
>
>  
>
> I had a win-rate of 3:1 after 50 games, I thought that was significant
> enough. I'll do more testing and see if this was a fluke.
>
>  
>
> I expand one node at a time, I don't see what memory problems it may
> cause so far. But my thinking times have been 5 sec. maximum and only
> on 9x9 so I don't have that many nodes to fill a lot of memory. I'm
> not going to sweat that part right now as I think the playouts will
> get slower and reduce memory-usage by itself in time.
>
>  
>
> With pseudo-liberties I get 20K playouts/sec. (2Ghz Intel, 9x9 board)
> That compares to 30K/sec. doing just playouts and no search. It
> surprised me to see the tree-traversal takes this much time but it may
> also be caused by trashing the cash and benchmarking the playouts by
> themselves make things look too good because it'll all fits in the cache.
>
>  
>
> With tactics the playouts during search drop to 4K/sec. Of course I
> don't capture stones that can't escape. I haven't spent a lot of time
> optimizing yet. I'd first rather experiment a lot more. At this point
> the actual speed is not so important as long as I can see relative
> improvements. My framework enables me to make and test different
> variations easily. In the end I'd prefer to optimize the best one
> instead of half a dozen of them.
>
>  
>
> What I did was to play deterministically during playout but use the
> tactical information just to first select the tactical moves ahead of
> the other during selection. The last part seems to give no gain at
> best, which surprised me. I can see that its effect is low in nodes
> that are visited often, but even those nodes are based on nodes deeper
> in the tree that are not visited as often. I'll have to investigate
> this further to know for sure what's going on.
>
>  
>
> Mark
>
>  
>
> On 5-feb-08, at 14:46, David Fotland wrote:
>
>
>
> Hi Mark,
>
>  
>
> You should run a lot more test games.  The 95% confidence interval on
> the result is at least sqrt(1/num_games), so you need 400 or more
> games to know the win rate within 5%.  I’ve seen many anomalous win
> rates when I used to test with 20 games.  Now I use 200 games minimum,
> and I try to get 500 before I make any conclusion.
>
>  
>
> I think mogo is the only strong program that uses the UCB1-tuned
> formula.  The others use the same formula you use.  I found a thesis
> where they measured many different formulas and found little
> difference.   If any strong program other than mogo uses some formula
> other than the basic one, can you please let us know?
>
>  
>
> The reason to only initialize the nodes after a certain count is to
> save memory.  The simple uct algorithm visits each child once before
> using uct to choose one.  If you create all child nodes before any are
> visited you end up with most of the nodes in the tree having zero visits.
>
>  
>
> How many playouts per second are you getting (from the start of the
> game on a 9x9 board), and on what hardware?
>
>  
>
> Regards,
>
>  
>
> David
>
>  
>
> *From:* [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] *On Behalf Of *Mark Boon
> *Sent:* Tuesday, February 05, 2008 7:54 AM
> *To:* computer-go
> *Subject:* [computer-go] More UCT / Monte-Carlo questions
>
>  
>
> Although most of my time has been eaten up by implementing/improving
> some general framework parts I did get a chance to play a bit with a
> simple UCT search. Some things that I found puzzled me a bit and I
> hoped someone had an explanation or similar experiences.
>
>  
>
> I implemented a very basic UCT / MC program first using
> pseudo-liberties. I figured this should be the base-line against which
> I can test some ideas. To test if the program actually worked properly
> I first let it play against Orego. The speed of my playouts are
> similar to Orego so I figured the level of play should be similar. (I
> switched off pondering and multiple-threading in Orego to get an
> apples-to-apples comparison.)
>
>  
>
> To my surprise my program seemed to be winning the majority of the
> games (after a few dozen games). When looking at Orego's output I
> couldn't help noticing that at the start of the game it prints much
> smaller numbers of 'runs' than my program, whereas by the end of the
> game the numbers are similar. This may be the reason for my program
> performing better. When I looked at the code of Orego I noticed there
> are two main differences:
>
>  
>
> - It computes the UCT value in a completely different way. A comment
> in the code refers to a paper called "Modification of UCT with
> Patterns in Monte-Carlo Go". I haven't studied this yet, but whatever
> it does it apparently doesn't do wonders over the standard C * sqrt(
> (2*ln(N)) / (10*n) ) that I use.
>
>  
>
> - It only initialises the list of untried moves in the tree after a
> node had a minimum run-count of 81 (on 9x9). For the life of me I
> couldn't figure out what the effect of this was or what it actually
> does. I was wondering if this has an effect of what is counted as a
> 'run' but I'm not sure.
>
>  
>
> Then I found a paragraph (4.2) in Remi Coulomn's paper about ELO
> raings in patterns. It briefly describes it as "As soon as a number of
> simulations is equal to the number of points on the board, this node
> is promoted to internal node, and pruning is applied." I can't help
> feeling that the implementation in Orego is doing this. But I can't
> figure out where it does any pruning or applying patterns of any kind.
> Is there supposedly a general benefit to this even without pruning or
> patterns? As stated before, at least it doesn't seem to provide any
> benefit over my more primitive implementation. Maybe Peter Drake or
> someone else familiar with Orego knows more about this?
>
>  
>
> Anyway, reading the same paragraph mentioned above again I was struck
> by another detail I thought surprising: after doing the required
> number of runs, the candidates are pruned to a certain number 'n'
> based on patterns. Does that mean from then on the win-ratio is
> ignored? What if the by far most successful move so far does not match
> any pattern? Am I misunderstanding something here? The paragraph is
> very brief and does not elaborate much detail.
>
>  
>
> On to my next step I introduced some very basic tactics to save stones
> with one liberty, capture the opponent's stones with one liberty and
> capturing the opponent's stones in a ladder. There are many possible
> choices here. Just doing this near the last move and/or over the whole
> board. Doing this in the simulation and/or during the selection.
>
>  
>
> Just doing this near the last move during simulation caused a
> slow-down of a factor 4 or 5 but improves play considerably. Also
> doing this near the last move during selection doesn't affect speed
> much but deteriorated play! Doing this first near the last move and
> then look for tactics over the whole board as a next step affected
> results negatively even more. Number of playouts are still in the same
> ball-park.
>
>  
>
> Thinking it over, since I don't use this to prune the selection but
> just to order the candidates I could see that after many runs the
> ordering suggested by the tactics get overriden by the UCT selection.
> So I could see the effect of using this for selection reduced steadily
> with the number of runs through a node. But still I didn't expect a
> considerable reduction in strength. So what could be happening here?
>
>  
>
> - I could have a bug.
>
> - I didn't run enough games (about 50)
>
> - Using knowledge to order the initial selection is counter-productive
> when not accompanied with pruning.
>
>  
>
> The last one I find very hard to believe. Did anyone else run into
> something like this?
>
>  
>
> Finally, I also looked a bit at using more threads to make use of more
> than one processor. I figure this can wait and it's better to keep
> things simple at this early stage but still it's something I want to
> keep in mind. When looking at what I need to do to enable multiple
> threads during search it seems to me I'll be required to lock
> substantial parts of the UCT-tree. This means traversing the tree when
> looking for the best node to expand is going to be the main
> bottle-neck. Maybe not with just two to four processors, but I foresee
> substantial diminishing returns after that. Is this correct? Is there
> experience with many processors? Maybe a different expansion algorithm
> will be required?
>
>  
>
>           Mark
>
>  
>
>  
>
>  
>
>  
>
>  
>
> _______________________________________________
>
> computer-go mailing list
>
> computer-go@computer-go.org <mailto:computer-go@computer-go.org>
>
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
>  
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] More UCT / Monte-Carlo questions

Reply via email to