Mark Boon wrote: > > On 1-apr-08, at 17:37, Don Dailey wrote: >> That's partly why I'm interested in exploring "on the fly" leaning. >> Learning outside the context of the position being played may not have >> much relevance. > > That would be most interesting indeed. I'd like to try but keep > running into obstacles.
Yes, this idea rolls of the lips nicely, but implementing it is another thing! > > For example: at the moment I have just a handful of patterns. These > patterns are important or 'urgent' if you like and they are already > enough to overcome the slow-down caused by pattern-matching. At the > moment I play a pattern randomly without distinction between them. UCT with MC is basically on the fly learning if you think about it. The learning is in the tree statistics and not the patterns or playouts. And it's more like rote learning with no generalization. The idea of playouts is a device that doesn't technically belong in these programs - they are a practical concession. It's not practical to build a tree to the end of the game and would quickly exhaust memory. The playouts are a proxy, a substitute or way to pretend you are following a tree that is being built. So the artificial part of this (in some sense) are the playouts and the patterns that go with heavy playouts. Of course since the earlier MC programs, we have moved the other way and made the tree act more like the heavy playouts by imposing a priori knowledge on the tree itself, just like we do the playouts. Sometimes I get anal about these things and wish there were a common consistent framework without any artificial separation. What has occurred to me is trying to find a sophisticated way to eliminate the tree, and yet still maintain the specificity that a tree gives you. My first very naive attempt was to use 3x3 patterns to specify a move, instead of 1x1 patterns (the point in question.) In other words the move e5 on an empty board is a different move than e5 when there is a stone on e6 for instance. So I tried using UCT in hash table mode, where edges of the graph were 3x3 patterns and each point on the board was part of the pattern signature (so the same patterns were different if the appeared on different points of the board.) So I still had a tree of sorts, but each node was shared with many positions that probably didn't resemble each other except on one local point. That is no good. In fact, this is barely better than just using the points directly. There is very little of the specificity of a tree. So I believe a better approach is a heavy playout approach with NO tree. Instead, rules would evolve based on knowledge learned from each playout - rules that would eventually move uniformly random moves into highly directed ones. All-moves-as-first teaches us that in the general case a move that is good now is good later or visa versa. But it needs to go way farther than that. It needs to "act like a tree" when something specific needs to be handled and generalize when this is most appropriate. If something like this could be made to work, a tree could probably be built on top of it if desired. This would be a super-playout approach. It would be interesting to see how far you could take a tree-less approach like this. Certainly you should be able to do far better than straight tree-less MC. To summarize, I think most of what is needed can be generalized on the fly, but a good system should be able to automatically adopt any degree of specificity required. The problem is how to create a mechanism that can detect that more specificity is required without imposing it on all the moves? I have some rough ideas on how to do this. - Don > > If I want to make anything 'learning' then I have to harvest patterns > and somehow compute their importance / urgency. There are multiple > ways to do that and Remi wrote a paper about one of them. At the > moment I use the average length a pattern is on the board. Urgent > patterns remain on the board for only a short while. Whether this is > better or worse than Remi's way I don't know. > > So I have now run the program for a few hundred games, adjusting the > urgency of the patterns on a continuing basis. And the program got > weaker! Selecting one at random is apparently superior to selecting a > pattern based on urgency. This is why I started out with just a few > patterns because it's easer to see what's happening. When I look at > the urgencies computed they actually look very reasonable. So that's > not the problem. When I think about what could cause this, the only > thing I can imagine is, again, that play becomes more deterministic. > > This is a recurring theme in my tests. Apparently it's something > important that still escapes me and which I have to understand to make > real progress. > > Mark > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/