Hi Philipp, Thank you for replying. I've seen your article before, but I read it only superficially then. By the way, I couldn't find any online source about the Thomson Heuristic. Do you know if ther are any? Fistly, I want to use ownership and local correlation statistics from light montecarlo playout for discovering subgames. I'm not particularly focussed on combining UCT and CGT directly. I don't want to perform global UCT or minimax tree search. This kind of tree search should be confined to local subgames only (to improve the accuracy of the mean and temperature of subgames). I only want to use global lookahead to discover subgames, not for exploring the global game tree. Secondly, discovering temperature and mean of subgames: Perhaps it is possible to get a rough approximation of the mean and temperature of subgames from ownership statistics and their correlations. Perhaps it is possible to represent large, cool subgames (ownership distribution near random, low correlation with surroundings) like the center in the opening, by simple statistical properties (the environment) until more details are discovered. Thridly, early in the game, moves don't just affect subgames, they also change how the global game divides into subgames. This is a major complication. Also, early in the game, subgames could have fuzzy (but cool) edges (like a joseki). I don't have a clear picture yet for how to deal with dynamic subgames with fuzzy edges. I'm envisioning a structure containing subgame divisions, pointing to subgames in a hashtable. So this structure would represent the superposition of subgame divisions discovered so far. But this is still very unclear. Also, lookahead within subgames may stuble upon subgame interference in a portion of their subgame trees (which may or may not affect the canonicalform of the combined subgame). I have no idea yet how to deal with this. I know, this is just a bunch of vague ideas with a lot of ifs. So I have some work to do ;) Dave
________________________________ Van: computer-go-boun...@computer-go.org namens Philipp Hennig Verzonden: ma 2-3-2009 1:04 Aan: computer-go@computer-go.org Onderwerp: RE: [computer-go] re-CGT approximating the sum of subgames Hi Martin, Dave and others, I have never posted to this list before (so, hi everyone! :-), but your discussion on approximate CGT caught my eye. Dave. you mentioned that you are interested in trying to mix UCT and CGT: Dave Devos wrote on 19 Feb 2009, 19:33 GMT: > Ok, I'll look into your temperature discovery article. I noticed > that Amazons is mentioned a lot in CGT, but I am not familiar with > the game. I want to use montecarlo to discover subgames. Then I want > to use CGT techniques to evaluate and sum subgames to get a full > board evaluation. If this is possible, it could dramatically reduce > the combinatorial explosion on large boards (like you said, totally > killing full board search) This is the idea. I don't even know if it > is possible, but I definitely want to give it a try. I worked on a similar idea during a short project early on in my PhD (which has since taken other directions). We (David Stern, Thore Graepel and myself) tried to use UCT to measure the temperature of Amazons games. You can find a technical report at http://www.inference.phy.cam.ac.uk/ph347/CPGS-Report_Hennig.pdf (I turned this into my first-year report, so you will have to excuse the length and the rambling at the beginning. If you know about UCT and CGT, you can directly jump to chapters 2.3 or 3.0). The short of it is: The results were not very encouraging. The problem is that TDS is searching for an event (the switch from the "stack" of coupons to play on the board) which happens at some depth >1 in the game tree*, and UCT is not very good at performing inference on such quantities. The only thing it is really good at is inference about the value of the topmost nodes, where the statistical fidelity is good (nodes even a few steps deep into the tree will often only be visited a handful of times, so there is not much data available to make a good estimate from). Further, adding a coupon stack to the overall game seems to sometimes create situations where the roll-outs become biased towards one player. In such situations, it can take UCT very long to discover the right solution, wasting a lot of search time expanding the wrong part of the tree. In chapter 5.1.1 of my report, you will find an experimental study of this behaviour using an artificial game tree. Then, there are a few more subtleties to keep in mind. One is that Go games rarely are truly separable into sub-games (End-Games are an exception, as Martin showed in his paper mentioned in his earlier mail). Independence may hinge on just one particularly "ingenious" move deep in the tree and therefore might be hard to find by Monte- Carlo search (but this is pure conjecture, so don't let yourself be stopped from trying). Finally, searching for independence will cost time itself. So, in the end, you need to search for independence, which takes time and introduces errors, because you might miss crucial connections between sub-games. Then you need to search for the temperature of each sub-game, which is more expensive than just searching for a good move in the sub-game, since the TDS tree is more complex than the board tree itself, and UCT needs way longer to build up any statistical evidence around the interesting bits deep in the search tree (where you also find the best next move). It also causes more uncertainty, since UCT-based TDS will have an unknown error. And, finally, playing HOTSTRAT is in itself an approximation to optimal play, although apparently a good one, and with bounded error. UCT on the whole board, on the other hand, has to struggle with the combinatorial explosion of the tree. But we know it converges to optimal play, eventually. Bottom line: I would be interested to hear about your results. All of my arguments above are obviously not very quantitative, and it might still be possible to solve Go by Divide and Conquer (humans do it, apparently), so I'm not saying you shouldn't try, not at all. But just naively mixing TDS and UCT, as I did, is not recommended. :-) Good luck with your experiments! Philipp *more precisely, the optimal shift happens at depth ~ ((T_max - T) / delta), where T is the temperature of the game, T_max is the value of the top-most coupon and delta is the resolution of the measurement (i.e. the difference between face values of consecutive coupons), because you will typically only have an upper bound of the true temperature, and the value of the topmost coupon on the stack has to be larger than the temperature. Also, TDS robustness to increasing delta or varying delta over the stack is not entirely understood, (see Martin's original TDS paper). In particular, something like a binary search for the right Temperature (starting with a huge delta, and subsequently refining the stack) didn't work for us. _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/