On Thu, 2007-07-05 at 10:50 -0600, David Silver wrote:
> We tried the whole spectrum from completely random to completely
> deterministic playouts, but we never came close to the performance of
> the "dumb" playouts! 

I don't understand - I though Mogo wasn't using "dumb" play-outs?

> We have seen a similar effect many times in MoGo. Often we try
> something that seems like it should improve the quality of the
> simulation player, but it makes the overall performance worse. It is
> frustrating and surprising! Has anyone else encountered this? 


Here is why this happens:    

Let's think of this in the context of pruning moves, since controlling
the play-outs can be cast in this way.

Presumably, if you run 1000 random play-outs from a given position you
will get a fair indication of "how good" the position is.   

But what if you are able to prune out many of the bad moves in that
simulation?   Would this improve the accuracy of the simulation?   

Probably, but not necessarily.   Suppose that during the play-outs, you
are able to prune out 50% of the "bad" black moves, but only 30% of the
"bad" white moves?     You would be playing 1000 simulations where BLACK
was playing consistently stronger, regardless of how good the actual
position was.   

If the chances were in fact pretty much even,  it would look as if black
had a big advantage.   If that color bias was consistent for that "type"
of position, building a UCT tree below it would not quickly fix the
problem.

The extra knowledge you impose is not impartial knowledge, it will work
better for one side than another, and for one position differently than
another.    So even if the average quality of the play-outs improve each
position is responding differently to the extra knowledge making it more
difficult to compare one position to another.   

There is one other issue I have seen  that is similar.  Sometimes
Lazarus will play a move that doesn't hurt nor help it's position.
It's not a wasted move because the opponent must respond or else lose.
An example is a simple self-atari which itself is a direct threat.   The
opponent is forced to respond, so there is no reason not to try for the
cheap shot in his territory, but in the grand scheme of things this move
is a distraction and if you could remove them from the tree it would
help the program focus on what is really important.    However,  it
sometimes pays to try moves like these.   When I "fixed" this problem in
Lazarus, it started winning less against weaker programs simply because
they sometimes fail to defend.    

I imagine this can happen in more sophisticated contexts, where certain
moves could be very effective in exploiting more naive (but not totally
stupid) programs.  In such a case, an improvement could make your
program appear weaker.  I don't think this is merely academic because I
model skill not as how many good moves you play, but as how many bad
moves you avoid playing.   In other words there is no such thing as a
good move - there are only bad moves.   So you want to present your
opponent with opportunities to play them.  

In chess it works the same, if you are playing a weak opponent it's
really stupid to trade quickly into a drawn ending just because the
position is even.  


- Don






_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to