Re: [computer-go] State of the art of pattern matching
On Tue, 2008-04-01 at 15:18 -0400, Joshua Shriver wrote: > Do you have a link to those papers? There is still one listed on the computer go bibliograpy: http://www.cs.ualberta.ca/~games/go/compgo_biblio/ The links don't seem to work, so I set up a copy here: http://nngs.ziar.net/cgml/gotxt/kojima.ps HTH, AvK ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] State of the art of pattern matching
Jonas Kahn wrote: > I guess you have checked that with your rules for getting probability > distributions out of gammas, the mean of the probability of your move 1 > was that that you observed (about 40 %) ? If I understand your post, there may be a misunderstanding by my fault. Here gamma is not a gamma function nor a gamma distribution but a constant. It is just the same Greek letter. I don't remember if it was in Crazystone's description or in some other paper I read to understand what Bradley Terry models are, I just got used to that notation. The best thing of BT models (for me) is the extreme simplicity at runtime (calibration may have more black magic) so: prob of move i = gamma[i] / SUMj gamma[j] where gamma[·] is a constant each pattern has. Setting those constants is the learning process. The 40% is obtained between move 20 and 119 of over 55K games. That is more than 5 M competitions. The patterns are learned for other move numbers as well. It considers the urgency modified by the first time the pattern appears. Also ko, but ko is learned to (hopefully) find ko threats, its impact on guessing is less important. It counts the number of times the first move was the right move, then the first + second, etc. The reason why two different ways of measuring the same: a. Probability expected for each move. b. Number of guesses of the best move, 2nd best, 3rd best, etc. don't match is mainly academic, because form a practical point of view, the important is to have a good move generator and (even more important) to understand its limitations. It cannot be considered as the only source of moves, but the first moves in terms of urgency are moves that should be paid attention. I admit, that what I call a probability distribution over the legal moves, is not really balanced I don't understand why, but, nevertheless, in terms of urgency, better moves get higher urgency. This is all I really need. Of course, I would welcome an explanation on why the 2 things don't match or if someone else can verify if his patterns give correct values. Jacques. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Some ideas how to make strong heavy playouts
On 1-apr-08, at 17:37, Don Dailey wrote: That's partly why I'm interested in exploring "on the fly" leaning. Learning outside the context of the position being played may not have much relevance. That would be most interesting indeed. I'd like to try but keep running into obstacles. For example: at the moment I have just a handful of patterns. These patterns are important or 'urgent' if you like and they are already enough to overcome the slow-down caused by pattern-matching. At the moment I play a pattern randomly without distinction between them. If I want to make anything 'learning' then I have to harvest patterns and somehow compute their importance / urgency. There are multiple ways to do that and Remi wrote a paper about one of them. At the moment I use the average length a pattern is on the board. Urgent patterns remain on the board for only a short while. Whether this is better or worse than Remi's way I don't know. So I have now run the program for a few hundred games, adjusting the urgency of the patterns on a continuing basis. And the program got weaker! Selecting one at random is apparently superior to selecting a pattern based on urgency. This is why I started out with just a few patterns because it's easer to see what's happening. When I look at the urgencies computed they actually look very reasonable. So that's not the problem. When I think about what could cause this, the only thing I can imagine is, again, that play becomes more deterministic. This is a recurring theme in my tests. Apparently it's something important that still escapes me and which I have to understand to make real progress. Mark ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] State of the art of pattern matching
On Wed, Apr 02, 2008 at 02:13:45PM +0100, Jacques Basaldúa wrote: > Jonas Kahn wrote: > > > I guess you have checked that with your rules for getting probability > > distributions out of gammas, the mean of the probability of your move 1 > > was that that you observed (about 40 %) ? > > If I understand your post, there may be a misunderstanding by my fault. > Here gamma is not a gamma function nor a gamma distribution but a constant. > It is just the same Greek letter. I don't remember if it was in > Crazystone's > description or in some other paper I read to understand what Bradley Terry > models are, I just got used to that notation. Sorry, you were clear, it's only me who did not remember BT correctly. I thought of a transform P = Lambda(gamma2 - gamma1) in the end, like for Elo estimation. The heavy tails could still very well come from inadequacy of the formula P(i) = gamma(i) / sum_j gamma(j) for the smaller probabilities. The most natural explanation for your underestimating the proability that the first move is the right one would come from correlations between patterns. A short toy model where only the first move can have one or two patterns, with same individual value, but with varying combined value, suggests that correlations between patterns should be positive to explain your result. I am a bit surprised, since I would have thought that correlations between patterns would rather be negative... Maybe you can test that in the following way: if you have say 500 patterns that have a much greater value than others, you could add to your learning an entry 'there is both pattern 1 and pattern 2' for all pairs of patterns among those 500, and see if you get better predictions in the end. Of course, if too many patterns are really relevant, this is hard to do. Anyhow, as you said, what's important is having good suggestions for urgent moves, not this kind of discrepancies. Jonas ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Some ideas how to make strong heavy playouts
> > By contrast, you > > should test (in the tree) a kind of move that is either good or average, > > but not either average or bad, even if it's the same amount of > > information. In the tree, you look for the best move. Near the root at > > least; when going deeper and the evaluation being less precise, you > > merely look for good moves, that keep a trustworthy evaluation of the > > position, and try to avoid brittle ones. > > > Again, you are semantically challenged but basically correct. A move > that you can statically evaluate as being on the lower end of the scale > does not have much information content - in other words evaluating it in > the tree has very little chance of changing the score. > OK. Then we agree. In fact, that's what UCB does: look for more information about what could be the best. By the way, there are schemes for putting a value a priori to a node, so as to widen progressively the search. Is there anyone who tried changing the uncertainty a priori ? An atari or such would get higher uncertainty, but maybe not higher a priori value. > The alpha beta procedure in general is about information content in a > big way.Many of the branches cut off in alpha beta spring from great > moves, that have no chance of changing the score so there is no useful > information there.You would not look at those moves just because > they happen to be "really great" moves. > > > In the playouts, that's another matter. I would say that (almost) always > > playing 'out of atari' would add stability, much in the way Magnus > > Persson very well explained. > > > > What do we want of playout policies ? > > As static evaluation functions, we would want them to give the right > > ordering of move values, with differences as wide as possible. > > More precisely, we would want that among the best moves, that's not > > important if the evaluation is not precise for bad moves. > > > Maybe I'm semantically challenged now, but the only correct ordering of > move values is win or lose.I suppose you are saying that there > should be a variety of scores that somehow reflect the difficulty of > winning? Yes, of course the true evaluation is zero or one, and we cannot access it, and more to the point, we cannot approach it. Any evaluation function such that, in a given position, the evaluation of the winning moves is better than that of the losing moves would be just as ideal. In fact, any evaluation such that the best move is a winning move if there is one would still be ideal. And we could have a useful ordering of the losing moves if all moves lose. Maybe one if these evaluation functions is available as a playout policy. Of course we cannot find one, but we could use this hypoth policy as reference. Another slightly more concrete possible reference playout policy would be: the full-fledged program plays the playout *IF* the programs are still random enough, this would make a lot of sense: we are then evaluating the winning probability of the program against itself. That's at least one of the opponents that is perfectly modelized. Such an ideal playout policy would not undervalue light play, for example, as it would know how to connect when threatened to be cut. Jonas ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Some ideas how to make strong heavy playouts
Mark Boon wrote: > > On 1-apr-08, at 17:37, Don Dailey wrote: >> That's partly why I'm interested in exploring "on the fly" leaning. >> Learning outside the context of the position being played may not have >> much relevance. > > That would be most interesting indeed. I'd like to try but keep > running into obstacles. Yes, this idea rolls of the lips nicely, but implementing it is another thing! > > For example: at the moment I have just a handful of patterns. These > patterns are important or 'urgent' if you like and they are already > enough to overcome the slow-down caused by pattern-matching. At the > moment I play a pattern randomly without distinction between them. UCT with MC is basically on the fly learning if you think about it. The learning is in the tree statistics and not the patterns or playouts.And it's more like rote learning with no generalization. The idea of playouts is a device that doesn't technically belong in these programs - they are a practical concession.It's not practical to build a tree to the end of the game and would quickly exhaust memory. The playouts are a proxy, a substitute or way to pretend you are following a tree that is being built. So the artificial part of this (in some sense) are the playouts and the patterns that go with heavy playouts. Of course since the earlier MC programs, we have moved the other way and made the tree act more like the heavy playouts by imposing a priori knowledge on the tree itself, just like we do the playouts. Sometimes I get anal about these things and wish there were a common consistent framework without any artificial separation. What has occurred to me is trying to find a sophisticated way to eliminate the tree, and yet still maintain the specificity that a tree gives you. My first very naive attempt was to use 3x3 patterns to specify a move, instead of 1x1 patterns (the point in question.) In other words the move e5 on an empty board is a different move than e5 when there is a stone on e6 for instance. So I tried using UCT in hash table mode, where edges of the graph were 3x3 patterns and each point on the board was part of the pattern signature (so the same patterns were different if the appeared on different points of the board.) So I still had a tree of sorts, but each node was shared with many positions that probably didn't resemble each other except on one local point.That is no good. In fact, this is barely better than just using the points directly. There is very little of the specificity of a tree. So I believe a better approach is a heavy playout approach with NO tree. Instead, rules would evolve based on knowledge learned from each playout - rules that would eventually move uniformly random moves into highly directed ones. All-moves-as-first teaches us that in the general case a move that is good now is good later or visa versa.But it needs to go way farther than that. It needs to "act like a tree" when something specific needs to be handled and generalize when this is most appropriate. If something like this could be made to work, a tree could probably be built on top of it if desired. This would be a super-playout approach. It would be interesting to see how far you could take a tree-less approach like this. Certainly you should be able to do far better than straight tree-less MC. To summarize, I think most of what is needed can be generalized on the fly, but a good system should be able to automatically adopt any degree of specificity required. The problem is how to create a mechanism that can detect that more specificity is required without imposing it on all the moves? I have some rough ideas on how to do this. - Don > > If I want to make anything 'learning' then I have to harvest patterns > and somehow compute their importance / urgency. There are multiple > ways to do that and Remi wrote a paper about one of them. At the > moment I use the average length a pattern is on the board. Urgent > patterns remain on the board for only a short while. Whether this is > better or worse than Remi's way I don't know. > > So I have now run the program for a few hundred games, adjusting the > urgency of the patterns on a continuing basis. And the program got > weaker! Selecting one at random is apparently superior to selecting a > pattern based on urgency. This is why I started out with just a few > patterns because it's easer to see what's happening. When I look at > the urgencies computed they actually look very reasonable. So that's > not the problem. When I think about what could cause this, the only > thing I can imagine is, again, that play becomes more deterministic. > > This is a recurring theme in my tests. Apparently it's something > important that still escapes me and which I have to understand to make > real progress. > > Mark > > > > > > > ___
Re: [computer-go] Some ideas how to make strong heavy playouts
> > So I believe a better approach is a heavy playout approach with NO > tree. Instead, rules would evolve based on knowledge learned from each > playout - rules that would eventually move uniformly random moves into > highly directed ones. All-moves-as-first teaches us that in the > general case a move that is good now is good later or visa versa.But > it needs to go way farther than that. It needs to "act like a tree" > when something specific needs to be handled and generalize when this is > most appropriate. If something like this could be made to work, a > tree could probably be built on top of it if desired. This would be a > super-playout approach. > This looks very much like the way human players work (albeit with a tree): read local sequences and outcomes that can be kept in reserve for a long time, but called about any time depending on the situation. Big chunks. Jonas ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/