Re: [computer-go] Goal-directedness of Monte-Carlo

Don Dailey Mon, 08 Sep 2008 13:50:28 -0700

On Mon, 2008-09-08 at 20:01 +0200, Gian-Carlo Pascutto wrote:
> Don Dailey wrote:
> 
> > That probably just means I have not stumbled on the right ideas or that
> > I was not able to properly tune it.   I would be delighted if someone
> > was able to show us a workable scheme.   I believe if something is found
> > it will result in a very minor improvement, but that it will be an
> > actual improvement.  
> 
> Would a discrepancy on the amount of ELO gained or lost per handicap
> stone, when comparing MC bots to humans & classical computers, be a good
> measure of the maximum possible improvement?


Maybe.  How could you accurately make such a measurement without
thousands of games?  

The problem seems to be a catch-22.   If you are in a dead won position,
it's really risking telling the program you are NOT in a dead won
position.   It now doesn't understand what is required to win the game,
it only knows that it must win another stone at all costs, whether it's
possible or not.   

Most of the time IT IS possible to win another stone or more without
much risk.   But as soon as you do, the dynamic komi adjuster says you
must do it again, and again, and again until you reach a situation where
it is not possible.

And I believe that is where the trouble comes.   At some point, you have
set a goal too high to reach.   This is signal to the program that it
must try at all costs to win (what appears to it to be) a dead lost game
and of course it will very likely play a high risk desperation move in
order to please its master.

So some simple naive scheme is not going to work.   However this would
probably work pretty well if you have some way to gain prior knowledge
about whether it would be safe to "escalate" or not.  

One simple way that might work with some tuning is to use search.  If
you are winning the game with high confidence,  reset the komi a few
stones and do another search (from scratch) to see if you are still
easily winning.  Perhaps something like a binary search will find the
right komi value that gives you a high winning confidence with maximum
greed or some acceptable balance of such.    

However, such a scheme is going to cost you resources - which perhaps
may cancel some or all of the benefit.   My own gut feeling tells me
that you are playing with pretty small margins.   At best how much can
we expect to gain?   

I think this is probably something we need to explore and do -
especially if it's important to the reputation of your product, or to
produce a product that mimics more the style of human players who are
less concerned about the beauty of omission.    I personally see a bit
of beauty in this style even though it certainly looks odd when you are
not used to it.   

When losing the game, a dynamic adjuster may be safer.  After all, you
are losing anyway, so why not try something?    It's not risky trying to
win a lost game by picking off whatever stones you can.

- Don



_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Goal-directedness of Monte-Carlo

Reply via email to