On Feb 1, 2008 4:18 PM, terry mcintyre <[EMAIL PROTECTED]> wrote: > UCT is based on a theory of a multi-armed bandit, with uncertain knowledge > about which "arms" would be most productive. Is it possible to graft various > sources of knowledge into a sort of meta-bandit algorithm?
IIRC, MoGo already does some of this. They use RAVE (worth up to 3000 sims?) and they also use some TD to get an initual heuristic worth about 50 sims. > As to fusing top-level knowledge with random playouts, I love the idea, > and am trying to imagine how to implement it. One idea is this: certain > moves in certain situations might trigger a forced reply. Playing one of a > pair of miai, for instance, should result in a high probability of the > matching move played as a response - especially if failure to do so would > result in killing a group. > Analysis of a group could conclude that life depends on certain external > liberties, or the ability to play one of two alternate moves, yadda yadda; > those threats then trigger appropriate automatic responses with high > probability. Remember that the arms at leaves correspond to monte carlo playouts. There's no real restriction of forcing other knowledge into those playouts. I too have imagined miai pairs. MoGo plays out local exchanges based on patterns before going to a more random sim. CrazyStone uses move probability heuristics to select moves (some responses are high probability and others are much less).
_______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/