RE: [computer-go] UCT vs MC

David Fotland Mon, 01 Jan 2007 10:28:53 -0800

Thanks.  So it seems that doing as many random games as possible is not the
ideal approach.


In UCT, I suppose the equivalent of the principal variation would be the
path from the root that always visits the child with the highest number of
simulations.  When you make a move with 70,000 simulations, how deep is this
UCT_principal variation?  

I expect that after this many simulations the UCT tree will include every
position in some number of ply from the root.  Deeper in the tree it will
not visit every child.  Typically, how many ply in the UCT tree is full
width?

I'm curious about the full width depth and the principal variation depth to
compare UCT wilth alpha-beta.

It's great to see a new approach to computer go that works so well.  Thanks
for sharing your work.

David

> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of 
> [EMAIL PROTECTED]
> Sent: Monday, January 01, 2007 3:56 AM
> To: computer-go
> Subject: Re: [computer-go] UCT vs MC
> 
> 
> Hello and happy new year!
> 
> 
> > I have some questions about your paper...
> Whouah that's a lot of questions :). I'll try to answer well.
> 
> 
> > I thought that the Monte Carlo evaluation of a position is done by 
> > making many random games from that position, and averaging the 
> > win/loss result. So the evaluation of a position would be a number 
> > between 0 and 1.  I thought several thousand random games would be 
> > used for one evaluation.
> No, we consider one evaluation as one simulation, so the 
> evaluation function 
> is a bernoulli random variable. Now you are mainly interested in its 
> expectation, and it is why you can think making a lot of 
> simulations and 
> averaging, trying to approximate the expectation by the 
> empirical average. 
> 
> > In your paper, say that each UCT leaf node is evaluated by 
> exactly one 
> > random game (simulation), with a result of 0 or 1.  Is this true?
> Yes it is true. We try having more that one simulation per 
> leaf node, and: -with the same number of nodes, the 
> improvement is very small; -with the same number of total 
> simulations the level is much weaker.
> 
> This seems counter intuitive, but in fact it is not. At each 
> simulation we add 
> a node. So a node often visited will have a lot of 
> descendants, so will 
> average a lot of simulations. The key idea of UCT is that the 
> value in a node 
> is the average of its children's values weigthed by the 
> frequency of visits 
> for its children.
> 
> > I think
> > you say Mogo does 70,000 random games per move.  Does this 
> mean that the
> > UCT tree for a move has 70,000 nodes?  When you say 70,000 
> games per move,
> > does that mean total game move made, or game per node evaluation?
> That means per MoGo's move. So yes, UCT tree for a move has 
> 70000 nodes. It is 
> the total number of simulations.
> I use the same count for the CGOS versions of MoGo. 
> MoGo_xxx_10k uses 10000 
> simulations for each move, or if you prefer, 10000 nodes in 
> its tree. That 
> means that if the CGOS game finished after 40 MoGo moves, 
> then MoGo has 
> computed 400 000 random simulations for the complete game. 
> (There is also a 
> 5000 simulations per move version on CGOS).
> 
> 
> > How many simulations (random games with patterns) does Mogo do per 
> > second?
> On a P4 3.4Ghz, 4500 in 9x9, 1000 in 19x19.
> 
> > How do you back up values in the UCT tree?  There are values in the 
> > example tree, but I can't see how they are calculated.
> 
> As in the UCT algorithm. For each node for the root to the 
> leaf of the current 
> sequence you simply add the 0/1 result to a variable, and 1 
> to the count of 
> the number of simulations.
> 
> > Your code says that the value is backed up by sum and 
> negation (line 
> > 26, value := -value).  But I don't see any negative values in your 
> > sample tree, or values greater than one.  How do you 
> actually back up 
> > values to the root?
> Sorry, it is value := 1-value. Thank you for pointing out the mistake.
> 
> 
> > on page 5 you say that UCB1-TUNED performs better and you use that 
> > formula. In the code for the algorithm, you use UCB (line 
> 16).  Which 
> > is correct?
> 
> Since the beginning we used UCB1-TUNED and it performed 
> better. Now with all 
> other improvements, and with a fine parameters tuning the 
> difference is very 
> small. UCB1-TUNED has the advantage that it does not need a 
> parameters tuning 
> to performs well in Go (the famous sqrt(1/10) constant Remi 
> Coulom posted in 
> this list).
> 
> > In your paper you show win rates against GnuGo of about 
> 50%, depending 
> > on the parameters.  The current Mogo beats GnuGo over 90%.  What 
> > changed?  Are you doing more simulations, or do you have more go 
> > knowledge in your patterns?
> 
> The results near 50% was with the uniform random simulations. 
> The 80% is with 
> the improved simulations. In the current MoGo there are new 
> improvements not 
> yet published. Currently, against gnugo 3.6 level 8 with 
> 70000 simulations, 
> the result is 92.5%.
> MoGo which plays on tournaments makes more than 70000 
> simulations (plays on a 
> quadri processors), but gnugo is also stronger (3.7.10 with 
> level > 8).
> 
> > Does Mogo have an opening book?
> Yes, it is an opening book generated using a "meta-UCT", that 
> means using UCT 
> algorithm from the empty position, and using MoGo playing 
> against itself as 
> the evaluation function. However the opening book is very 
> small (3 ply in 
> average, 5 ply maximum).
> 
> I hope that makes things clearer.
> 
> Sylvain
> 
> > > -----Original Message-----
> > > From: [EMAIL PROTECTED]
> > > [mailto:[EMAIL PROTECTED] On Behalf Of 
> > > [EMAIL PROTECTED]
> > > Sent: Sunday, December 31, 2006 11:50 AM
> > > To: computer-go
> > > Subject: Re: [computer-go] UCT vs MC
> > >
> > >
> > > Hello,
> > >
> > > > It looks to me that the strength of the top programs, 
> like Mogo, 
> > > > is mostly due to the new UCT search algorithm.
> > >
> > > It depends what you compare to.
> > > If you compare UCT against no tree, this makes a lot of 
> difference. 
> > > If you compare UCT to former Remi Coulom's tree search algorithm, 
> > > Remi can answer better than me. He posted a message some 
> months ago, 
> > > and (if I remember well) his experiments showed an 
> improvement like
> > > 5-10% against gnugo
> > > comparing to his previous tree search algorithm (which 
> was already an
> > > improvement over other tree search algorithms).
> > > If you compare uniform-MC/UCT against top programs in 9x9,
> > > the difference is
> > > huge. MoGo was using UCT since its first version on CGOS and
> > > was ranked at
> > > 1650. So with UCT and uniform MC you are very far from the
> > > top. UCT alone is
> > > far to be enough.
> > >
> > > > I wonder what would happen if we took a tradional strong
> > >
> > > computer-go
> > >
> > > > evaluation function and combined it with UCT?
> > >
> > > This should be an interesting experiment. My bet is that 
> it should 
> > > not change a lot the strength. May be this can be worst than 
> > > alpha-beta. Perhaps the
> > > gnugo team already tried?
> > > UCT has its advantages and drawbacks, and I think its
> > > performance against
> > > alpha-beta depends on the evaluation function. UCT is
> > > efficient for MC
> > > evaluation function. It is not obvious it is efficient with
> > > "traditional"
> > > evaluation function. I would be very happy to know the result.
> > >
> > > Sylvain
> > >
> > > _______________________________________________
> > > computer-go mailing list
> > > computer-go@computer-go.org 
> > > http://www.computer-go.org/mailman/listinfo/computer-go/
> >
> > _______________________________________________
> > computer-go mailing list
> > computer-go@computer-go.org 
> > http://www.computer-go.org/mailman/listinfo/computer-go/
> 
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org 
> http://www.computer-go.org/mailman/listinfo/computer-go/
> 


_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

RE: [computer-go] UCT vs MC

Reply via email to