Re: [computer-go] UCT vs MC

Don Dailey Mon, 01 Jan 2007 11:52:48 -0800

On Mon, 2007-01-01 at 20:10 +0100, [EMAIL PROTECTED] wrote:
> 
> > I'm curious about the full width depth and the principal variation
> depth to
> > compare UCT wilth alpha-beta.
> The comparison is not so easy to do I think, because using MC as an
> evaluation 
> function for alpha beta, you have to do several simulations for one 
> evaluation and take the average. So the question is "how many
> simulations to 
> do" (tradeoff "false alpha-beta cuts"/"depth")? The right should be
> "the 
> number which makes the stronger player". I did not made such
> experiments. 
> Perhaps someone did?


I did something similar.   But it takes a huge amount of horsepower to
zero in on the right value.

What seems to be the case is that you need less simulations as you
search deeper.   Of course it's always better to do more simulations
but the question is the trade-off of whether it's better to go 1 ply
deeper doing less,  or one ply less doing more. 

Unfortunately, to get the right answer requires a lot of work.  There
are other variables too such as how much cheating should you do.   You
can seriously reduce the number of simulations you do at end nodes by
stopping early when it appears unlikely your score will fall within
the alpha/beta window.    You can do this by asking the question, "what
would be my score if the next N games were all wins or losses?"

I also discovered that it is not efficient doing too few simulations.
If you are not doing enough,  doubling the number of simulations only
increases the search effort slightly,  or in some cases it improves
it.   This is almost certainly because of move ordering issues,  very
difficult to get good move ordering with a fuzzy evaluator.   

I have a theory on how to make straight alpha beta work with monte
carlo evaluations at end nodes.   You want to use monte carlo
as an evaluation function, but you want it applied to quiet positions.
I think you need to take your end node positions before you apply monte
carlo as an evaluation and do something similar to the following:

   1. Identify clearly dead and live groups.

   2. Create a "proxy position" that is similar to the position you want
      to evaluate, but has been "fixed-up" to be less confusing to the
      monte carlo simulations.    

      This could involve placing stones so that living groups are not
      touched in the simulation.  It might involve artificially killing
      dead groups so that monte carlo is not distracted killing them.

      A veto list involves identifying moves that cannot possibly help
      and telling the monte carlo simulation about these moves.   Even
      if you know the move can't possible help in the next 4 or 5 moves,
      it might be a major boost of quality to simulation.

      It might involve move suggestions, veto-tables, etc. to help the
      monte carlo search.    The idea is to do anything that makes the
      final position more quiet and monte carlo search more relevant
      and doing it safely - only what you can be sure of.

   3. And your monte carlo search should have some intelligence built
      in like Mogo does,  it's not 100 percent random.

This is just an idea that hasn't been tried or tested as far as I know,
something to be taken with a grain of salt! 

- Don




_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] UCT vs MC

Reply via email to