Re: [Computer-go] MCTS and perfect endgame

Jonathan Chetwynd Mon, 04 Jul 2011 10:20:26 -0700

hideki wrote This is obviously wrong in handcap games,
but what else is there?


to start with the perhaps obvious,
I believe komi is raised in pachi and tailed off as the game progresses.
ie the goal is high to start and lowers as the game progresses.

when playing as white in the opening the goal is to survive, maintainsente where possible, and leave options open.in this sense one plays moves with complex indeterminate high riskoutcomes,

which is perhaps similar to high komi

similarly black probably plays over territorial moves trying to secureand thereby reduce risk.


we cant hypothecate what a better player might play, can we?

regards

Jonathan

On 4 Jul 2011, at 17:58, Hideki Kato wrote:

Interesting thoughts and I have a question.
How about handicap games? The opponent used in the simulations isselfin most (all?) MCTS programs. This is obviously wrong in handcapgames
and the evaluation function returns wrong estimations of scores and
winning rates.  So, the question is how to maximize winning chances in
such games.

Hideki
Ben Shoemaker: <[email protected]>:
From: terry mcintyre <[email protected]>
"The major one is that the MCTS scoring function is imperfect;historically, programs have
snatched defeat from the jaws of victory by letting points benibbled away in yose."
(Apologies to those who understand go and computer-go better thanme--these are just my
thoughts on the discussion.)
There are several elements within this debate of "play to maximizewins" versus "play to
maximize points":
1) What strategy is perfect play?
2) What strategy is strongest with MCTS

3) What strategy is closest to human play
4) Would a combination of strategies be stronger than either alone?

Let's examine these elements further:
1) According to the rules of Go, the winner is the player with thehighest score, but a winis equivalent to any other win--winning by 0.5 points is enough.So perfect play would
maximize wins but not necessarily points.
However, the winner is determined by points, so an accurate countof points (evaluation) isnecessary to determine the winner. At the end of the game, this istrivial. Earlier in thegame this is harder. A perfect evaluation function would lead toperfect play--only winningmoves would be played. Most current go programs seem to use the"play to maximize wins"strategy but so far none can play perfectly so we can say thattheir evaluation functions arenot perfect. With a perfect evaluation function, the "play tomaximize points" strategy
should also lead to perfect play.
2) Many go program authors have stated that "play to maximize wins"is stronger than "play tomaximize points". I think this is because their evaluationfunctions are imperfectlyoptimistic--the program counts points that future play does notdeliver. Depending on themargin of error in the score estimation, this can turn a win into aloss. By focusing onwins rather than points, current programs minimize the effect ofthe "optimistic score
estimation" problem.
3) Humans seem to play with a combination of the two strategies--and every human might use adifferent combination. Seeing all the way through a game to theend score is difficult fromthe beginning of the game, so we analyze "local" situations fortheir point values andcombine the local situations to approximate the global situation.As the game progresses,the score estimation becomes more accurate and human players adjusttheir strategy accordingto the margin of error. If they are way behind, they play veryaggressively or resign. Ifthey are slightly behind, they play slightly aggressively to catchup. If they are slightlyahead, they play safely to secure the win. If they are way ahead,they play very safely orpass to prompt their opponent to resign. While "playing human-likemoves" is a separate goalfrom "playing to maxmize wins" that does not mean that anythingother than pure "playing to
maxmize wins" WILL make any given program
weaker and only serves the goal of "playing human-like moves".Even if no-one has yet found
such an improvement it certainly could exist in theory.
4) Until a perfect evaluation function is implemented, programmerswill wonder (andexperimentally test) if the "play to maximize wins" is optimal fortheir imperfect evaluationfunction. So far, it seems to be the strongest strategy, butcurrent programs do have knowndeficiencies, and there is no proof that a combination ofstrategies would always beweaker--especially since that might differ for each individualevaluation function.
The obvious way to improve the strength of a go program is toimprove the evaluation function(easier said than done). Classical programs used hard-coded goknowledge and it wassurprising when MCTS programs surpassed them with very little goknowledge and clearlyimperfect evaluation. As program authors have found a way tobalance the speed and accuracyof "heavy" playouts, the MCTS programs have improved further.Beside improving theevaluation function, there may be improvements in strategy thatwould help an imperfect
program play stronger.

Ben Shoemaker.
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
--
Hideki Kato <mailto:[email protected]>
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go


_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Re: [Computer-go] MCTS and perfect endgame

Reply via email to