Re: [Computer-go] MCTS and perfect endgame

Don Dailey Mon, 04 Jul 2011 08:59:13 -0700

On Mon, Jul 4, 2011 at 11:02 AM, Ben Shoemaker <[email protected]>wrote:


> >From: terry mcintyre <[email protected]>
> >"The major one is that the MCTS scoring function is imperfect;
> historically, programs have snatched defeat from the jaws of victory by
> letting points be nibbled away in yose."
>
> (Apologies to those who understand go and computer-go better than me--these
> are just my thoughts on the discussion.)
>
> There are several elements within this debate of "play to maximize wins"
> versus "play to maximize points":
> 1) What strategy is perfect play?
> 2) What strategy is strongest with MCTS
>
> 3) What strategy is closest to human play
> 4) Would a combination of strategies be stronger than either alone?
>
> Let's examine these elements further:
>
> 1) According to the rules of Go, the winner is the player with the highest
> score, but a win is equivalent to any other win--winning by 0.5 points is
> enough.  So perfect play would maximize wins but not necessarily points.
>

I think you are right.  In fact you say not necessarily but I say,
"definitely not",  you won't maximize points by playing to win.


>
> However, the winner is determined by points, so an accurate count of points
> (evaluation) is necessary to determine the winner.  At the end of the game,
> this is trivial.  Earlier in the game this is harder.  A perfect evaluation
> function would lead to perfect play--only winning moves would be played.
>  Most current go programs seem to use the "play to maximize wins" strategy
> but so far none can play perfectly so we can say that their evaluation
> functions are not perfect.  With a perfect evaluation function, the "play to
> maximize points" strategy should also lead to perfect play.
>

This is true, but with a perfect evaluation function this all
becomes irrelevant.   The whole Modus operandi of current methods is how to
compute probability and statistics.

Another way to see this is that if you win maximally (in the point sense)
you also win.   So winning the most points is a more difficult goal and a
superset of just winning.


>
> 2) Many go program authors have stated that "play to maximize wins" is
> stronger than "play to maximize points".  I think this is because their
> evaluation functions are imperfectly optimistic--the program counts points
> that future play does not deliver.  Depending on the margin of error in the
> score estimation, this can turn a win into a loss.  By focusing on wins
> rather than points, current programs minimize the effect of the "optimistic
> score estimation" problem.
>
> 3) Humans seem to play with a combination of the two strategies--and every
> human might use a different combination.  Seeing all the way through a game
> to the end score is difficult from the beginning of the game, so we analyze
> "local" situations for their point values and combine the local situations
> to approximate the global situation.  As the game progresses, the score
> estimation becomes more accurate and human players adjust their strategy
> according to the margin of error.  If they are way behind, they play very
> aggressively or resign.  If they are slightly behind, they play slightly
> aggressively to catch up.  If they are slightly ahead, they play safely to
> secure the win.  If they are way ahead, they play very safely or pass to
> prompt their opponent to resign.  While "playing human-like moves" is a
> separate goal from "playing to maxmize wins" that does not mean that
> anything other than pure "playing to maxmize wins" WILL make any given
> program
>  weaker and only serves the goal of "playing human-like moves".  Even if
> no-one has yet found such an improvement it certainly could exist in
> theory.
>

Playing to win is the only strategy,  the only issue at question is how to
improve our estimate of winning chances and it's certainly possible that
figuring  out how to factor in other things (such as consolation or "yose")
 could improve our estimate.    I don't have any problem with that.    It's
just that this is not very interesting compared to many other more important
factors which affect our winning chances.

In fact, I think this is a little bit like treating the symptom instead of
the disease.   If you figure out how to improve on these other more
interesting problems you will get better behavior as a natural side-effect,
 at least in the cases that really matter.



> 4) Until a perfect evaluation function is implemented, programmers will
> wonder (and experimentally test) if the "play to maximize wins" is optimal
> for their imperfect evaluation function.  So far, it seems to be the
> strongest strategy, but current programs do have known deficiencies, and
> there is no proof that a combination of strategies would always be
> weaker--especially since that might differ for each individual evaluation
> function.
>

Playing to maximize wins is never the wrong strategy,  the only issue is how
to do this better.  Obviously,  counting points does not do this better but
maybe something else will.     However MCTS work by taking statistics on
playouts.   So you have to either improve the playouts themselves or figure
out how to bias the results in such a way that will make this a better
predictor of actual winning chances.    To do this, you have to overdie what
actually happens, perhaps with external knowledge.

The real message I'm trying to push here is that counting points is
misguided,  it does not improve on the estimate but something else might.
Point count is a 1 dimension thing,  we need more information than a single
value.  Does the extra point give up a group to get another?   Does it risk
the win?    The point count by itself just doesn't tell you if you are being
smart or stupid.


>
> The obvious way to improve the strength of a go program is to improve the
> evaluation function (easier said than done).  Classical programs used
> hard-coded go knowledge and it was surprising when MCTS programs surpassed
> them with very little go knowledge and clearly imperfect evaluation.  As
> program authors have found a way to balance the speed and accuracy of
> "heavy" playouts, the MCTS programs have improved further.  Beside improving
> the evaluation function, there may be improvements in strategy that would
> help an imperfect program play stronger.
>

I think you made some nice observations.

Don




>
> Ben Shoemaker.
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>

_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Re: [Computer-go] MCTS and perfect endgame

Reply via email to