RE: [computer-go] More UCT / Monte-Carlo questions

David Fotland Tue, 05 Feb 2008 11:11:19 -0800

Can you elaborate on what is in a node, and what you mean by expand?   I
assume you have simple node, where each node represents a position and the
single move to get there.  Then when you find a node with no children and
expand it, you allocate up to 81 new nodes, but you only make one random
playout.  If uct picks a different leaf next time, you end up with most of
the leaf nodes never visited.  In this case you run out of memory quickly.
If there are a few hundred K playouts to pick a move, and 90% of the leaves
have no visits, then you need over a million nodes of memory.


 

How do other programs handle this?  I see that aya has an array of all
children in each node.  This still means allocating memory for all children
when a new node is allocated.

 

It think many programs run several simulations through a node before
allocating the children.  I can see how this saves memory, but then how do
you save the RAVE information from the early simulations?

 

David 

 

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Mark Boon
Sent: Tuesday, February 05, 2008 10:32 AM
To: computer-go
Subject: Re: [computer-go] More UCT / Monte-Carlo questions

 

Thanks for the answers from both David and Magnus.

 

I had a win-rate of 3:1 after 50 games, I thought that was significant
enough. I'll do more testing and see if this was a fluke.

 

I expand one node at a time, I don't see what memory problems it may cause
so far. But my thinking times have been 5 sec. maximum and only on 9x9 so I
don't have that many nodes to fill a lot of memory. I'm not going to sweat
that part right now as I think the playouts will get slower and reduce
memory-usage by itself in time.

 

With pseudo-liberties I get 20K playouts/sec. (2Ghz Intel, 9x9 board) That
compares to 30K/sec. doing just playouts and no search. It surprised me to
see the tree-traversal takes this much time but it may also be caused by
trashing the cash and benchmarking the playouts by themselves make things
look too good because it'll all fits in the cache.

 

With tactics the playouts during search drop to 4K/sec. Of course I don't
capture stones that can't escape. I haven't spent a lot of time optimizing
yet. I'd first rather experiment a lot more. At this point the actual speed
is not so important as long as I can see relative improvements. My framework
enables me to make and test different variations easily. In the end I'd
prefer to optimize the best one instead of half a dozen of them.

 

What I did was to play deterministically during playout but use the tactical
information just to first select the tactical moves ahead of the other
during selection. The last part seems to give no gain at best, which
surprised me. I can see that its effect is low in nodes that are visited
often, but even those nodes are based on nodes deeper in the tree that are
not visited as often. I'll have to investigate this further to know for sure
what's going on.

 

Mark

 

On 5-feb-08, at 14:46, David Fotland wrote:





Hi Mark,

 

You should run a lot more test games.  The 95% confidence interval on the
result is at least sqrt(1/num_games), so you need 400 or more games to know
the win rate within 5%.  I've seen many anomalous win rates when I used to
test with 20 games.  Now I use 200 games minimum, and I try to get 500
before I make any conclusion.

 

I think mogo is the only strong program that uses the UCB1-tuned formula.
The others use the same formula you use.  I found a thesis where they
measured many different formulas and found little difference.   If any
strong program other than mogo uses some formula other than the basic one,
can you please let us know?

 

The reason to only initialize the nodes after a certain count is to save
memory.  The simple uct algorithm visits each child once before using uct to
choose one.  If you create all child nodes before any are visited you end up
with most of the nodes in the tree having zero visits.

 

How many playouts per second are you getting (from the start of the game on
a 9x9 board), and on what hardware?

 

Regards,

 

David

 

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Mark Boon
Sent: Tuesday, February 05, 2008 7:54 AM
To: computer-go
Subject: [computer-go] More UCT / Monte-Carlo questions

 

Although most of my time has been eaten up by implementing/improving some
general framework parts I did get a chance to play a bit with a simple UCT
search. Some things that I found puzzled me a bit and I hoped someone had an
explanation or similar experiences.

 

I implemented a very basic UCT / MC program first using pseudo-liberties. I
figured this should be the base-line against which I can test some ideas. To
test if the program actually worked properly I first let it play against
Orego. The speed of my playouts are similar to Orego so I figured the level
of play should be similar. (I switched off pondering and multiple-threading
in Orego to get an apples-to-apples comparison.)

 

To my surprise my program seemed to be winning the majority of the games
(after a few dozen games). When looking at Orego's output I couldn't help
noticing that at the start of the game it prints much smaller numbers of
'runs' than my program, whereas by the end of the game the numbers are
similar. This may be the reason for my program performing better. When I
looked at the code of Orego I noticed there are two main differences:

 

- It computes the UCT value in a completely different way. A comment in the
code refers to a paper called "Modification of UCT with Patterns in
Monte-Carlo Go". I haven't studied this yet, but whatever it does it
apparently doesn't do wonders over the standard C * sqrt( (2*ln(N)) / (10*n)
) that I use.

 

- It only initialises the list of untried moves in the tree after a node had
a minimum run-count of 81 (on 9x9). For the life of me I couldn't figure out
what the effect of this was or what it actually does. I was wondering if
this has an effect of what is counted as a 'run' but I'm not sure.

 

Then I found a paragraph (4.2) in Remi Coulomn's paper about ELO raings in
patterns. It briefly describes it as "As soon as a number of simulations is
equal to the number of points on the board, this node is promoted to
internal node, and pruning is applied." I can't help feeling that the
implementation in Orego is doing this. But I can't figure out where it does
any pruning or applying patterns of any kind. Is there supposedly a general
benefit to this even without pruning or patterns? As stated before, at least
it doesn't seem to provide any benefit over my more primitive
implementation. Maybe Peter Drake or someone else familiar with Orego knows
more about this?

 

Anyway, reading the same paragraph mentioned above again I was struck by
another detail I thought surprising: after doing the required number of
runs, the candidates are pruned to a certain number 'n' based on patterns.
Does that mean from then on the win-ratio is ignored? What if the by far
most successful move so far does not match any pattern? Am I
misunderstanding something here? The paragraph is very brief and does not
elaborate much detail.

 

On to my next step I introduced some very basic tactics to save stones with
one liberty, capture the opponent's stones with one liberty and capturing
the opponent's stones in a ladder. There are many possible choices here.
Just doing this near the last move and/or over the whole board. Doing this
in the simulation and/or during the selection.

 

Just doing this near the last move during simulation caused a slow-down of a
factor 4 or 5 but improves play considerably. Also doing this near the last
move during selection doesn't affect speed much but deteriorated play! Doing
this first near the last move and then look for tactics over the whole board
as a next step affected results negatively even more. Number of playouts are
still in the same ball-park.

 

Thinking it over, since I don't use this to prune the selection but just to
order the candidates I could see that after many runs the ordering suggested
by the tactics get overriden by the UCT selection. So I could see the effect
of using this for selection reduced steadily with the number of runs through
a node. But still I didn't expect a considerable reduction in strength. So
what could be happening here?

 

- I could have a bug.

- I didn't run enough games (about 50)

- Using knowledge to order the initial selection is counter-productive when
not accompanied with pruning.

 

The last one I find very hard to believe. Did anyone else run into something
like this?

 

Finally, I also looked a bit at using more threads to make use of more than
one processor. I figure this can wait and it's better to keep things simple
at this early stage but still it's something I want to keep in mind. When
looking at what I need to do to enable multiple threads during search it
seems to me I'll be required to lock substantial parts of the UCT-tree. This
means traversing the tree when looking for the best node to expand is going
to be the main bottle-neck. Maybe not with just two to four processors, but
I foresee substantial diminishing returns after that. Is this correct? Is
there experience with many processors? Maybe a different expansion algorithm
will be required?

 

          Mark

 

 

 

 

 

_______________________________________________

computer-go mailing list

computer-go@computer-go.org

http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

RE: [computer-go] More UCT / Monte-Carlo questions

Reply via email to