Re: [computer-go] More UCT / Monte-Carlo questions

Mark Boon Tue, 05 Feb 2008 10:32:52 -0800

Thanks for the answers from both David and Magnus.

I had a win-rate of 3:1 after 50 games, I thought that wassignificant enough. I'll do more testing and see if this was a fluke.

I expand one node at a time, I don't see what memory problems it maycause so far. But my thinking times have been 5 sec. maximum and onlyon 9x9 so I don't have that many nodes to fill a lot of memory. I'mnot going to sweat that part right now as I think the playouts willget slower and reduce memory-usage by itself in time.

With pseudo-liberties I get 20K playouts/sec. (2Ghz Intel, 9x9 board)That compares to 30K/sec. doing just playouts and no search. Itsurprised me to see the tree-traversal takes this much time but itmay also be caused by trashing the cash and benchmarking the playoutsby themselves make things look too good because it'll all fits in thecache.

With tactics the playouts during search drop to 4K/sec. Of course Idon't capture stones that can't escape. I haven't spent a lot of timeoptimizing yet. I'd first rather experiment a lot more. At this pointthe actual speed is not so important as long as I can see relativeimprovements. My framework enables me to make and test differentvariations easily. In the end I'd prefer to optimize the best oneinstead of half a dozen of them.

What I did was to play deterministically during playout but use thetactical information just to first select the tactical moves ahead ofthe other during selection. The last part seems to give no gain atbest, which surprised me. I can see that its effect is low in nodesthat are visited often, but even those nodes are based on nodesdeeper in the tree that are not visited as often. I'll have toinvestigate this further to know for sure what's going on.


Mark

On 5-feb-08, at 14:46, David Fotland wrote:

Hi Mark,
You should run a lot more test games. The 95% confidence intervalon the result is at least sqrt(1/num_games), so you need 400 ormore games to know the win rate within 5%. I’ve seen manyanomalous win rates when I used to test with 20 games. Now I use200 games minimum, and I try to get 500 before I make any conclusion.
I think mogo is the only strong program that uses the UCB1-tunedformula. The others use the same formula you use. I found athesis where they measured many different formulas and found littledifference. If any strong program other than mogo uses someformula other than the basic one, can you please let us know?
The reason to only initialize the nodes after a certain count is tosave memory. The simple uct algorithm visits each child oncebefore using uct to choose one. If you create all child nodesbefore any are visited you end up with most of the nodes in thetree having zero visits.
How many playouts per second are you getting (from the start of thegame on a 9x9 board), and on what hardware?
Regards,



David
From: [EMAIL PROTECTED] [mailto:computer-go-[EMAIL PROTECTED] On Behalf Of Mark Boon
Sent: Tuesday, February 05, 2008 7:54 AM
To: computer-go
Subject: [computer-go] More UCT / Monte-Carlo questions
Although most of my time has been eaten up by implementing/improving some general framework parts I did get a chance to play abit with a simple UCT search. Some things that I found puzzled me abit and I hoped someone had an explanation or similar experiences.
I implemented a very basic UCT / MC program first using pseudo-liberties. I figured this should be the base-line against which Ican test some ideas. To test if the program actually workedproperly I first let it play against Orego. The speed of myplayouts are similar to Orego so I figured the level of play shouldbe similar. (I switched off pondering and multiple-threading inOrego to get an apples-to-apples comparison.)
To my surprise my program seemed to be winning the majority of thegames (after a few dozen games). When looking at Orego's output Icouldn't help noticing that at the start of the game it prints muchsmaller numbers of 'runs' than my program, whereas by the end ofthe game the numbers are similar. This may be the reason for myprogram performing better. When I looked at the code of Orego Inoticed there are two main differences:
- It computes the UCT value in a completely different way. Acomment in the code refers to a paper called "Modification of UCTwith Patterns in Monte-Carlo Go". I haven't studied this yet, butwhatever it does it apparently doesn't do wonders over the standardC * sqrt( (2*ln(N)) / (10*n) ) that I use.
- It only initialises the list of untried moves in the tree after anode had a minimum run-count of 81 (on 9x9). For the life of me Icouldn't figure out what the effect of this was or what it actuallydoes. I was wondering if this has an effect of what is counted as a'run' but I'm not sure.
Then I found a paragraph (4.2) in Remi Coulomn's paper about ELOraings in patterns. It briefly describes it as "As soon as a numberof simulations is equal to the number of points on the board, thisnode is promoted to internal node, and pruning is applied." I can'thelp feeling that the implementation in Orego is doing this. But Ican't figure out where it does any pruning or applying patterns ofany kind. Is there supposedly a general benefit to this evenwithout pruning or patterns? As stated before, at least it doesn'tseem to provide any benefit over my more primitive implementation.Maybe Peter Drake or someone else familiar with Orego knows moreabout this?
Anyway, reading the same paragraph mentioned above again I wasstruck by another detail I thought surprising: after doing therequired number of runs, the candidates are pruned to a certainnumber 'n' based on patterns. Does that mean from then on the win-ratio is ignored? What if the by far most successful move so fardoes not match any pattern? Am I misunderstanding something here?The paragraph is very brief and does not elaborate much detail.
On to my next step I introduced some very basic tactics to savestones with one liberty, capture the opponent's stones with oneliberty and capturing the opponent's stones in a ladder. There aremany possible choices here. Just doing this near the last move and/or over the whole board. Doing this in the simulation and/or duringthe selection.
Just doing this near the last move during simulation caused a slow-down of a factor 4 or 5 but improves play considerably. Also doingthis near the last move during selection doesn't affect speed muchbut deteriorated play! Doing this first near the last move and thenlook for tactics over the whole board as a next step affectedresults negatively even more. Number of playouts are still in thesame ball-park.
Thinking it over, since I don't use this to prune the selection butjust to order the candidates I could see that after many runs theordering suggested by the tactics get overriden by the UCTselection. So I could see the effect of using this for selectionreduced steadily with the number of runs through a node. But stillI didn't expect a considerable reduction in strength. So what couldbe happening here?
- I could have a bug.

- I didn't run enough games (about 50)
- Using knowledge to order the initial selection is counter-productive when not accompanied with pruning.
The last one I find very hard to believe. Did anyone else run intosomething like this?
Finally, I also looked a bit at using more threads to make use ofmore than one processor. I figure this can wait and it's better tokeep things simple at this early stage but still it's something Iwant to keep in mind. When looking at what I need to do to enablemultiple threads during search it seems to me I'll be required tolock substantial parts of the UCT-tree. This means traversing thetree when looking for the best node to expand is going to be themain bottle-neck. Maybe not with just two to four processors, but Iforesee substantial diminishing returns after that. Is thiscorrect? Is there experience with many processors? Maybe adifferent expansion algorithm will be required?
          Mark











_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] More UCT / Monte-Carlo questions

Reply via email to