Re: [computer-go] re-CGT approximating the sum of subgames

Philipp Hennig Sat, 07 Mar 2009 14:51:01 -0800

Hi Dave,

sorry that I'm so slow in replying.

You are right that there seems to be some confusion about the originof the Thompson heuristic, and I am not entirely clear about itshistory myself. My understanding so far is that the heuristic wasn'tactually invented by Thompson as such. Rather, it uses a principlefirst shown by him, in the 1933 paper I cited. It works like this:

Assume you don't know precise values for a set of variables (like, thevalue of several available moves), but only a set of possible valuesfor each of them, together with probabilities for these values. It ispossible, then, to obtain one sample each from these probabilitydistributions.Then the probability for a given element of the set to have themaximal value under your current beliefs is equal to the probabilityof the corresponding sample to be the largest among the set ofsamples. I don't know who was the first to use this idea to guideexploration (you may want to have a look at David Stern's PhD thesison uncertainty in Go, he might be among the first people to use it inthis way. I'd like to provide you with a link, but I currently can'taccess his website. A search for "David Stern Microsoft" should help).

So the "Thompson Heuristic" name is shorthand for the idea of keepingtrack of a whole distribution of possible values for each move andguiding exploration by choosing greedily among samples drawn fromthese distributions. In doing so, we end up with a policy that startsout with entirely random exploration, and converges to adeterministic, greedy policy over time as our beliefs converge, in aprincipled way.

On the ideas you mention in your mail: It might well be possible toapproximately detect sub-games, then approximate their temperature bysome statistical measure and then focus UCT on areas of the game thatseem interesting under that measure. Many other AI challenges haveseen surprising improvements based on unexpected approximatesimplifications.But you have to be aware that each of the approximations you proposeintroduces an error of unknown size and direction in your inference(for example, very hot games have a habit of looking perfectly cold,much more so than less hot games, because there is just _one_particularly good move, and everything else about them is reallyboring. If your method misses such games, it will miss the verywinning move it is looking for). So you may have to spend a lot oftime optimising your method by trial and error, to avoid such mishaps,and it may be very hard to test for such issues short of actuallyplaying hundreds of individual games against your machine and judgingit's gameplay yourself. (Remember that we have no reference values tocompare to for the temperature of opening positions in Go. There areonly a handful of End-Game positions for which people havepainstakingly derived the temperature. See Elwyn Berlekamp's and DavidWolfe's book called "Chilling Gets the Last Point").

It was this prospect that drove me away from the idea of divide andconquer. I wanted to have a method that is guaranteed to converge tothe right solution eventually, so I turned away from CGT. But myyardstick is to get a PhD in probability theory, not to build theworld's best Go machine, so my scepticism shouldn't worry you toomuch. :-)


Good luck, and happy hacking!

Philipp
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] re-CGT approximating the sum of subgames

Reply via email to