date:20080402

Re: [computer-go] State of the art of pattern matching

2008-04-02 Thread Moi de Quoi

On Tue, 2008-04-01 at 15:18 -0400, Joshua Shriver wrote:
> Do you have a link to those papers?

There is still one listed on the computer go bibliograpy:
http://www.cs.ualberta.ca/~games/go/compgo_biblio/

The links don't seem to work, so I set up a copy here:

http://nngs.ziar.net/cgml/gotxt/kojima.ps

HTH,
AvK

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] State of the art of pattern matching

2008-04-02 Thread Jacques Basaldúa

Jonas Kahn wrote:

> I guess you have checked that with your rules for getting probability
> distributions out of gammas, the mean of the probability of your move 1
> was that that you observed (about 40 %) ?

If I understand your post, there may be a misunderstanding by my fault.
Here gamma is not a gamma function nor a gamma distribution but a constant.
It is just the same Greek letter. I don't remember if it was in 
Crazystone's

description or in some other paper I read to understand what Bradley Terry
models are, I just got used to that notation.

The best thing of BT models (for me) is the extreme simplicity at runtime
(calibration may have more black magic) so:

  prob of move i = gamma[i] / SUMj gamma[j]

where gamma[·] is a constant each pattern has. Setting those constants is
the learning process.

The 40% is obtained between move 20 and 119 of over 55K games. That is more
than 5 M competitions. The patterns are learned for other move numbers as
well. It considers the urgency modified by the first time the pattern 
appears.

Also ko, but ko is learned to (hopefully) find ko threats, its impact on
guessing is less important. It counts the number of times the first move
was the right move, then the first + second, etc.

The reason why two different ways of measuring the same:

a. Probability expected for each move.
b. Number of guesses of the best move, 2nd best, 3rd best, etc.

don't match is mainly academic, because form a practical point of view, the
important is to have a good move generator and (even more important) to
understand its limitations. It cannot be considered as the only source of
moves, but the first moves in terms of urgency are moves that should be
paid attention.

I admit, that what I call a probability distribution over the legal moves,
is not really balanced I don't understand why, but, nevertheless, in terms
of urgency, better moves get higher urgency. This is all I really need.
Of course, I would welcome an explanation on why the 2 things don't match
or if someone else can verify if his patterns give correct values.

Jacques.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Some ideas how to make strong heavy playouts

2008-04-02 Thread Mark Boon



On 1-apr-08, at 17:37, Don Dailey wrote:

That's partly why I'm interested in exploring "on the fly" leaning.
Learning outside the context of the position being played may not have
much relevance.


That would be most interesting indeed. I'd like to try but keep  
running into obstacles.


For example: at the moment I have just a handful of patterns. These  
patterns are important or 'urgent' if you like and they are already  
enough to overcome the slow-down caused by pattern-matching. At the  
moment I play a pattern randomly without distinction between them.


If I want to make anything 'learning' then I have to harvest patterns  
and somehow compute their importance / urgency. There are multiple  
ways to do that and Remi wrote a paper about one of them. At the  
moment I use the average length a pattern is on the board. Urgent  
patterns remain on the board for only a short while. Whether this is  
better or worse than Remi's way I don't know.


So I have now run the program for a few hundred games, adjusting the  
urgency of the patterns on a continuing basis. And the program got  
weaker! Selecting one at random is apparently superior to selecting a  
pattern based on urgency. This is why I started out with just a few  
patterns because it's easer to see what's happening. When I look at  
the urgencies computed they actually look very reasonable. So that's  
not the problem. When I think about what could cause this, the only  
thing I can imagine is, again, that play becomes more deterministic.


This is a recurring theme in my tests. Apparently it's something  
important that still escapes me and which I have to understand to  
make real progress.


Mark



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] State of the art of pattern matching

2008-04-02 Thread Jonas Kahn

On Wed, Apr 02, 2008 at 02:13:45PM +0100, Jacques Basaldúa wrote:
> Jonas Kahn wrote:
>
> > I guess you have checked that with your rules for getting probability
> > distributions out of gammas, the mean of the probability of your move 1
> > was that that you observed (about 40 %) ?
>
> If I understand your post, there may be a misunderstanding by my fault.
> Here gamma is not a gamma function nor a gamma distribution but a constant.
> It is just the same Greek letter. I don't remember if it was in 
> Crazystone's
> description or in some other paper I read to understand what Bradley Terry
> models are, I just got used to that notation.

Sorry, you were clear, it's only me who did not remember BT correctly. I
thought of a transform P = Lambda(gamma2 - gamma1) in the end, like for
Elo estimation.

The heavy tails could still very well come from inadequacy of the
formula P(i) = gamma(i) / sum_j gamma(j) for the smaller probabilities.

The most natural explanation for your underestimating the proability
that the first move is the right one would come from correlations
between patterns.

A short toy model where only the first move can have one or two
patterns, with same individual value, but with varying combined value,
suggests that correlations between patterns should be positive to
explain your result.

I am a bit surprised, since I would have thought that correlations
between patterns would rather be negative...

Maybe you can test that in the following way: if you have say 500
patterns that have a much greater value than others, you could add to
your learning an entry 'there is both pattern 1 and pattern 2' for all
pairs of patterns among those 500, and see if you get better predictions
in the end. 
Of course, if too many patterns are really relevant, this is hard to do.  

Anyhow, as you said, what's important is having good suggestions for
urgent moves, not this kind of discrepancies.

Jonas
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Some ideas how to make strong heavy playouts

2008-04-02 Thread Jonas Kahn

> > By contrast, you
> > should test (in the tree) a kind of move that is either good or average,
> > but not either average or bad, even if it's the same amount of
> > information. In the tree, you look for the best move. Near the root at
> > least; when going deeper and the evaluation being less precise, you
> > merely look for good moves, that keep a trustworthy evaluation of the
> > position, and try to avoid brittle ones.
> >   
> Again, you are semantically challenged but basically correct.   A move
> that you can statically evaluate as being on the lower end of the scale
> does not have much information content - in other words evaluating it in
> the tree has very little chance of changing the score.   
>

OK. Then we agree. In fact, that's what UCB does: look for more
information about what could be the best.
By the way, there are schemes for putting a value a priori to a node, so
as to widen progressively the search. 

Is there anyone who tried changing the uncertainty a priori ? An atari
or such would get higher uncertainty, but maybe not higher a priori
value. 
 
> The alpha beta procedure in general is about information content in a
> big way.Many of the branches cut off in alpha beta spring from great
> moves, that have no chance of changing the score so there is no useful
> information there.You would not look at those moves just because
> they happen to be "really great" moves.  
> 
> > In the playouts, that's another matter. I would say that (almost) always
> > playing 'out of atari' would add stability, much in the way Magnus
> > Persson very well explained.
> >
> > What do we want of playout policies ? 
> > As static evaluation functions, we would want them to give the right
> > ordering of move values, with differences as wide as possible.
> > More precisely, we would want that among the best moves, that's not
> > important if the evaluation is not precise for bad moves.
> >   
> Maybe I'm semantically challenged now,  but the only correct ordering of
> move values is win or lose.I suppose you are saying that there
> should be a variety of scores that somehow reflect the difficulty of
> winning?  

Yes, of course the true evaluation is zero or one, and we cannot access
it, and more to the point, we cannot approach it.

Any evaluation function such that, in a given position, the evaluation
of the winning moves is better than that of the losing moves would be
just as ideal. In fact, any evaluation such that the best move is a
winning move if there is one would still be ideal. And we could have a
useful ordering of the losing moves if all moves lose.

Maybe one if these evaluation functions is available as a playout
policy. Of course we cannot find one, but we could use this hypoth
policy as reference.

Another slightly more concrete possible reference playout policy would
be: the full-fledged program plays the playout
*IF* the programs are still random enough, this would make a lot of
sense: we are then evaluating the winning probability of the program
against itself. That's at least one of the opponents that is perfectly
modelized.
Such an ideal playout policy would not undervalue light play, for
example, as it would know how to connect when threatened to be cut.

Jonas
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Some ideas how to make strong heavy playouts

2008-04-02 Thread Don Dailey

Mark Boon wrote:
>
> On 1-apr-08, at 17:37, Don Dailey wrote:
>> That's partly why I'm interested in exploring "on the fly" leaning.
>> Learning outside the context of the position being played may not have
>> much relevance.
>
> That would be most interesting indeed. I'd like to try but keep
> running into obstacles.

Yes,  this idea rolls of the lips nicely,  but implementing it is
another thing!  
>
> For example: at the moment I have just a handful of patterns. These
> patterns are important or 'urgent' if you like and they are already
> enough to overcome the slow-down caused by pattern-matching. At the
> moment I play a pattern randomly without distinction between them.
UCT with MC is basically on the fly learning if you think about it.  
The learning is in the tree statistics and not the patterns or
playouts.And it's more like rote learning with no generalization.

The idea of playouts is a device that doesn't technically belong in
these programs - they are a practical concession.It's not practical
to build a tree to the end of the game and would quickly exhaust
memory. The playouts are a proxy, a substitute or way to pretend you
are following a tree that is being built. So the artificial part of
this (in some sense) are the playouts and the patterns that go with
heavy playouts.  

Of course since the earlier MC programs, we have moved the other way and
made the tree act more like the heavy playouts by imposing a priori
knowledge on the tree itself,   just like we do the playouts.  

Sometimes I get anal about these things and wish there were a common
consistent framework without any artificial separation.  What has
occurred to me is trying to find a sophisticated way to eliminate the
tree, and yet still maintain the specificity that a tree gives you.   My
first very naive attempt was to use 3x3 patterns to specify a move,
instead of 1x1 patterns (the point in question.) In other words the
move e5 on an empty board is a different move than e5 when there is a
stone on e6 for instance.  So I tried using UCT in hash table mode, 
where edges of the graph were 3x3 patterns and each point on the board
was part of the pattern signature (so the same patterns were different
if the appeared on different points of the board.) So I still had a
tree of sorts,  but each node was shared with many positions that
probably didn't resemble each other except on one local point.That
is no good.

In fact, this is barely better than just using the points directly. 
There is very little of the specificity of a tree.  

So I believe a better approach is a heavy playout approach with NO
tree.  Instead, rules would evolve based on knowledge learned from each
playout - rules that would eventually move uniformly random moves into
highly directed ones.  All-moves-as-first teaches us that in the
general case a move that is good now is good later or visa versa.But
it needs to go way farther than that.   It needs to "act like a tree"
when something specific needs to be handled and generalize when this is
most appropriate.   If something like this could be made to work, a
tree could probably be built on top of it if desired.   This would be a
super-playout approach. 

It would be interesting to see how far you could take a tree-less
approach like this.  Certainly you should be able to do far better than
straight tree-less MC.  

To summarize, I think most of what is needed can be generalized on the
fly,  but a good system should be able to automatically adopt any degree
of specificity required.  The problem is how to create a mechanism
that can detect that more specificity is required without imposing it on
all the moves? I have some rough ideas on how to do this.

- Don

>
> If I want to make anything 'learning' then I have to harvest patterns
> and somehow compute their importance / urgency. There are multiple
> ways to do that and Remi wrote a paper about one of them. At the
> moment I use the average length a pattern is on the board. Urgent
> patterns remain on the board for only a short while. Whether this is
> better or worse than Remi's way I don't know.
>
> So I have now run the program for a few hundred games, adjusting the
> urgency of the patterns on a continuing basis. And the program got
> weaker! Selecting one at random is apparently superior to selecting a
> pattern based on urgency. This is why I started out with just a few
> patterns because it's easer to see what's happening. When I look at
> the urgencies computed they actually look very reasonable. So that's
> not the problem. When I think about what could cause this, the only
> thing I can imagine is, again, that play becomes more deterministic.
>
> This is a recurring theme in my tests. Apparently it's something
> important that still escapes me and which I have to understand to make
> real progress.
>
> Mark
>
>
>
>
> 
>
> ___

Re: [computer-go] Some ideas how to make strong heavy playouts

2008-04-02 Thread Jonas Kahn

> 
> So I believe a better approach is a heavy playout approach with NO
> tree.  Instead, rules would evolve based on knowledge learned from each
> playout - rules that would eventually move uniformly random moves into
> highly directed ones.  All-moves-as-first teaches us that in the
> general case a move that is good now is good later or visa versa.But
> it needs to go way farther than that.   It needs to "act like a tree"
> when something specific needs to be handled and generalize when this is
> most appropriate.   If something like this could be made to work, a
> tree could probably be built on top of it if desired.   This would be a
> super-playout approach. 
> 
This looks very much like the way human players work (albeit with a
tree): read local sequences and outcomes that can be kept in reserve for
a long time, but called about any time depending on the situation. Big
chunks.


Jonas
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] State of the art of pattern matching

Re: [computer-go] State of the art of pattern matching

Re: [computer-go] Some ideas how to make strong heavy playouts

Re: [computer-go] State of the art of pattern matching

Re: [computer-go] Some ideas how to make strong heavy playouts

Re: [computer-go] Some ideas how to make strong heavy playouts

Re: [computer-go] Some ideas how to make strong heavy playouts

7 matches

Site Navigation

Mail list logo

Footer information