Hi Phillip,
 
Thank you for your help and explanations :)
 
Dave

________________________________

Van: computer-go-boun...@computer-go.org namens Philipp Hennig
Verzonden: za 7-3-2009 23:50
Aan: computer-go@computer-go.org
Onderwerp: Re: [computer-go] re-CGT approximating the sum of subgames



Hi Dave,

sorry that I'm so slow in replying.

You are right that there seems to be some confusion about the origin 
of the Thompson heuristic, and I am not entirely clear about its 
history myself. My understanding so far is that the heuristic wasn't 
actually invented by Thompson as such. Rather, it uses a principle 
first shown by him, in the 1933 paper I cited. It works like this:

Assume you don't know precise values for a set of variables (like, the 
value of several available moves), but only a set of possible values 
for each of them, together with probabilities for these values. It is 
possible, then, to obtain one sample each from these probability 
distributions.
Then the probability for a given element of the set to have the 
maximal value under your current beliefs is equal to the probability 
of the corresponding sample to be the largest among the set of 
samples. I don't know who was the first to use this idea to guide 
exploration (you may want to have a look at David Stern's PhD thesis 
on uncertainty in Go, he might be among the first people to use it in 
this way. I'd like to provide you with a link, but I currently can't 
access his website. A search for "David Stern Microsoft" should help).

So the "Thompson Heuristic" name is shorthand for the idea of keeping 
track of a whole distribution of possible values for each move and 
guiding exploration by choosing greedily among samples drawn from 
these distributions. In doing so, we end up with a policy that starts 
out with entirely random exploration, and converges to a 
deterministic, greedy policy over time as our beliefs converge, in a 
principled way.


On the ideas you mention in your mail: It might well be possible to 
approximately detect sub-games, then approximate their temperature by 
some statistical measure and then focus UCT on areas of the game that 
seem interesting under that measure. Many other AI challenges have 
seen surprising improvements based on unexpected approximate 
simplifications.
But you have to be aware that each of the approximations you propose 
introduces an error of unknown size and direction in your inference 
(for example, very hot games have a habit of looking perfectly cold, 
much more so than less hot games, because there is just _one_ 
particularly good move, and everything else about them is really 
boring. If your method misses such games, it will miss the very 
winning move it is looking for). So you may have to spend a lot of 
time optimising your method by trial and error, to avoid such mishaps, 
and it may be very hard to test for such issues short of actually 
playing hundreds of individual games against your machine and judging 
it's gameplay yourself. (Remember that we have no reference values to 
compare to for the temperature of opening positions in Go. There are 
only a handful of End-Game positions for which people have 
painstakingly derived the temperature. See Elwyn Berlekamp's and David 
Wolfe's book called "Chilling Gets the Last Point").

It was this prospect that drove me away from the idea of divide and 
conquer. I wanted to have a method that is guaranteed to converge to 
the right solution eventually, so I turned away from CGT. But my 
yardstick is to get a PhD in probability theory, not to build the 
world's best Go machine, so my scepticism shouldn't worry you too 
much. :-)

Good luck, and happy hacking!

Philipp
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to