Hi Dave, sorry that I'm so slow in replying.
You are right that there seems to be some confusion about the origin of the Thompson heuristic, and I am not entirely clear about its history myself. My understanding so far is that the heuristic wasn't actually invented by Thompson as such. Rather, it uses a principle first shown by him, in the 1933 paper I cited. It works like this:
Assume you don't know precise values for a set of variables (like, the value of several available moves), but only a set of possible values for each of them, together with probabilities for these values. It is possible, then, to obtain one sample each from these probability distributions. Then the probability for a given element of the set to have the maximal value under your current beliefs is equal to the probability of the corresponding sample to be the largest among the set of samples. I don't know who was the first to use this idea to guide exploration (you may want to have a look at David Stern's PhD thesis on uncertainty in Go, he might be among the first people to use it in this way. I'd like to provide you with a link, but I currently can't access his website. A search for "David Stern Microsoft" should help).
So the "Thompson Heuristic" name is shorthand for the idea of keeping track of a whole distribution of possible values for each move and guiding exploration by choosing greedily among samples drawn from these distributions. In doing so, we end up with a policy that starts out with entirely random exploration, and converges to a deterministic, greedy policy over time as our beliefs converge, in a principled way.
On the ideas you mention in your mail: It might well be possible to approximately detect sub-games, then approximate their temperature by some statistical measure and then focus UCT on areas of the game that seem interesting under that measure. Many other AI challenges have seen surprising improvements based on unexpected approximate simplifications. But you have to be aware that each of the approximations you propose introduces an error of unknown size and direction in your inference (for example, very hot games have a habit of looking perfectly cold, much more so than less hot games, because there is just _one_ particularly good move, and everything else about them is really boring. If your method misses such games, it will miss the very winning move it is looking for). So you may have to spend a lot of time optimising your method by trial and error, to avoid such mishaps, and it may be very hard to test for such issues short of actually playing hundreds of individual games against your machine and judging it's gameplay yourself. (Remember that we have no reference values to compare to for the temperature of opening positions in Go. There are only a handful of End-Game positions for which people have painstakingly derived the temperature. See Elwyn Berlekamp's and David Wolfe's book called "Chilling Gets the Last Point").
It was this prospect that drove me away from the idea of divide and conquer. I wanted to have a method that is guaranteed to converge to the right solution eventually, so I turned away from CGT. But my yardstick is to get a PhD in probability theory, not to build the world's best Go machine, so my scepticism shouldn't worry you too much. :-)
Good luck, and happy hacking! Philipp _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/