Mark Boon wrote:
>
> On 1-apr-08, at 17:37, Don Dailey wrote:
>> That's partly why I'm interested in exploring "on the fly" leaning.
>> Learning outside the context of the position being played may not have
>> much relevance.
>
> That would be most interesting indeed. I'd like to try but keep
> running into obstacles.

Yes,  this idea rolls of the lips nicely,  but implementing it is
another thing!  
>
> For example: at the moment I have just a handful of patterns. These
> patterns are important or 'urgent' if you like and they are already
> enough to overcome the slow-down caused by pattern-matching. At the
> moment I play a pattern randomly without distinction between them.
UCT with MC is basically on the fly learning if you think about it.  
The learning is in the tree statistics and not the patterns or
playouts.    And it's more like rote learning with no generalization.

The idea of playouts is a device that doesn't technically belong in
these programs - they are a practical concession.    It's not practical
to build a tree to the end of the game and would quickly exhaust
memory.     The playouts are a proxy, a substitute or way to pretend you
are following a tree that is being built.     So the artificial part of
this (in some sense) are the playouts and the patterns that go with
heavy playouts.  

Of course since the earlier MC programs, we have moved the other way and
made the tree act more like the heavy playouts by imposing a priori
knowledge on the tree itself,   just like we do the playouts.  

Sometimes I get anal about these things and wish there were a common
consistent framework without any artificial separation.      What has
occurred to me is trying to find a sophisticated way to eliminate the
tree, and yet still maintain the specificity that a tree gives you.   My
first very naive attempt was to use 3x3 patterns to specify a move,
instead of 1x1 patterns (the point in question.)     In other words the
move e5 on an empty board is a different move than e5 when there is a
stone on e6 for instance.      So I tried using UCT in hash table mode, 
where edges of the graph were 3x3 patterns and each point on the board
was part of the pattern signature (so the same patterns were different
if the appeared on different points of the board.)     So I still had a
tree of sorts,  but each node was shared with many positions that
probably didn't resemble each other except on one local point.    That
is no good.

In fact, this is barely better than just using the points directly. 
There is very little of the specificity of a tree.  

So I believe a better approach is a heavy playout approach with NO
tree.  Instead, rules would evolve based on knowledge learned from each
playout - rules that would eventually move uniformly random moves into
highly directed ones.      All-moves-as-first teaches us that in the
general case a move that is good now is good later or visa versa.    But
it needs to go way farther than that.   It needs to "act like a tree"
when something specific needs to be handled and generalize when this is
most appropriate.       If something like this could be made to work, a
tree could probably be built on top of it if desired.   This would be a
super-playout approach. 

It would be interesting to see how far you could take a tree-less
approach like this.  Certainly you should be able to do far better than
straight tree-less MC.  

To summarize, I think most of what is needed can be generalized on the
fly,  but a good system should be able to automatically adopt any degree
of specificity required.      The problem is how to create a mechanism
that can detect that more specificity is required without imposing it on
all the moves?     I have some rough ideas on how to do this.

- Don



>
> If I want to make anything 'learning' then I have to harvest patterns
> and somehow compute their importance / urgency. There are multiple
> ways to do that and Remi wrote a paper about one of them. At the
> moment I use the average length a pattern is on the board. Urgent
> patterns remain on the board for only a short while. Whether this is
> better or worse than Remi's way I don't know.
>
> So I have now run the program for a few hundred games, adjusting the
> urgency of the patterns on a continuing basis. And the program got
> weaker! Selecting one at random is apparently superior to selecting a
> pattern based on urgency. This is why I started out with just a few
> patterns because it's easer to see what's happening. When I look at
> the urgencies computed they actually look very reasonable. So that's
> not the problem. When I think about what could cause this, the only
> thing I can imagine is, again, that play becomes more deterministic.
>
> This is a recurring theme in my tests. Apparently it's something
> important that still escapes me and which I have to understand to make
> real progress.
>
> Mark
>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to