I think there was some confusion in Don's post on ``out of atari'' in
play-outs.
For one thing, I do not agree with the maximal information argument.
Testing ``out of atari'' moves is not good because they might be good,
or might be bad, but merely because they might be good. By contrast, you
should test (in the tree) a kind of move that is either good or average,
but not either average or bad, even if it's the same amount of
information. In the tree, you look for the best move. Near the root at
least; when going deeper and the evaluation being less precise, you
merely look for good moves, that keep a trustworthy evaluation of the
position, and try to avoid brittle ones.

In the playouts, that's another matter. I would say that (almost) always
playing 'out of atari' would add stability, much in the way Magnus
Persson very well explained.

What do we want of playout policies ? 
As static evaluation functions, we would want them to give the right
ordering of move values, with differences as wide as possible.
More precisely, we would want that among the best moves, that's not
important if the evaluation is not precise for bad moves.

Now we build a tree while computing the evaluation function. So that we
can allow for false good moves if they are quickly seen as such in the
tree, that is after a one or three plies search, if the false good moves
are not too numerous.
False wrong moves is much worse, since we might never exploit the branch
long enough to correct this feeling.

The latter paragraph applies also to pre-ordering (I keep a model like
the one of Crazystone in mind, with a priori probabilities for each move
in the tree, and also a distribution in the playouts).


Conclusions:
It's no matter if there is a bias in the playout policy, as long as it's
the same for all moves.
Bias toward solid move is therefore a nuisance...
Playing (not too numerous) nonsensical moves would only be a nuisance if
there are positions whose associated playouts call for many more urgent
moves than in others.

What matters is making two moves having very different values. Here the
reduction of noise comes into play: if there is 50% chance in all
playouts that an alive group dies, and decides the game, then the
difference in evaluation of the other positions is divided by two...
Taking out all the common noise (that is the mistakes that appear in all
playouts) makes the distinction easier.
On the other hand, concentrating along a wrong evaluation (after this
particular move, the group is dead) would be a catastropha.
If this comes from one particular move, it should be noticed and
played/avoided systematically.


About learning on the fly:
I agree completely, that was one of my first posts.
However I really think we should have learnt patterns (and other
properties) first: you cannot learn the whole of go, or even your game,
in one game. 
And learning is a good thing, but you must find the good move first, and
should as quick as possible.
For one thing, if we learn patterns, here they should obviously be
localized (not translation invariant). We could also learn short move
sequences.
In a setting with probability distribution on moves, taking that into
account is merely changing the probability distribution. Question is:
how ?

By the way, my guess is that learning on the fly would be more important
in the playouts than in the tree: it would contribute to stabilizing the
playouts. The tree should anyhow end up with the good moves.

This learning should probably also come from the playouts (very much
information, and we could stay with information already calculated for
the playouts, allowing easy re-use), automatically building a status for
groups that are solved only near the end...

Jonas, who always reread thinking ``How do I manage to be that unclear
!"
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to