> http://ewh.ieee.org/cmte/cis/mtsc/ieeecis/tutorial2007/Bruno_Bouzy_2007.pdf
> 
> Page 89, "which kind of outcome". This method is better than the above
> and similar to what Jonas seems to propose. The improvement is minor.

By looking at their proposal (45 * win + score), in contrast to mine,
there is no scaling with 1/n. Yet they get an improvement, with
something in fact of the order 1/\sqrt{n}. But I think the latter does
not play a role.

Why ?

First hypothesis: misleading data.

Second hypothesis: better evaluation function, if we had the ``exact''
expectation of the evaluation function under the MC runs.
Here is an interesting argument towards the following:
- Even with many simulations, it is likely that all these simulations
  are different by move 50.
  Suppose we had an oracle who, after 50 moves in the MC simulation,
  told if the game is a win or a loss under perfect play. Stopping the
  simulation here and taking this result would probably be a better
  evaluation function (discussion below). We could then wonder how best
  to approach this evaluation function by what we know, that is the
  score at the end of the MC runs.
  Then taking the same value for a half point victory or a 60 points
  victory is ludicrous: MC is not perfect play, we have an uncertainty
  depending on how far we are in the game and on the position. But a 60
  points victory would probably mean that we had a victory after the 50
  first moves of the simulation.
  Under this hypothesis, the evaluation of the node should be 
\int_{-score}^{\infty} Phi(x) dx, where Phi(x) is the probability that the MC 
score is off the perfect play by (+x) points.

- Now this Phi could be evaluated by looking at typical results of many
  simulations from the same position (assuming the mean or the median is
  perfection); alternatively, it could be practically evaluated from the
  simulations in the node before.

- BUT this evaluation function would not be efficient: I  have not
  access to any data, but I think that the MC simulations must have a
  big variance. That would give too narrow a margin to victory, I guess.

- Why should this evaluation with oracle be better ? Well, that's the
  true minimax evaluation of the position from move 50, same as counting
  at the end. What we have inbetween is noise. Making better playouts
  can result in worse play because it is more deterministic, so that the
  evaluation is less robust (could always make the same mistake), but
  here it IS robust: we have said perfect play, no mistake. If we had
  that at the root of the tree, that would be God playing.
  However, as I said, we arrive at a hardly tenable conclusion for the
  practical evaluation function. Why?
  Probably, the RIGHT  evaluation function is not victory or defeat,
  it's: how likely is the program to win from this position (say against
  itself) ? Indeed, since we have random players, it even has a precise
  meaning. Hence, non-integer values could make sense. 
  Now, since programs with tree search play better than their MC
  simulations, we would have less variance in the outcome from this
  infamous move 50, and hence, we would also have a convolution. We
  would then not use 0-1, but not anything too broad either. Something
  like (45 * win + score) is reasonable, around 0 in particular (it
  should saturate for high scores).

- In any case, this would also mean that the bonus for the win should be
  made higher at yose. Indeed, the nearer the end, the more accurate the
  MC scoring is (for one simulation) and hence the less we must
  convolve.


And this is not having a spoilt child. That's explicitely trying to
imitate a better evaluation function, that we cannot know directly.

Jonas
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to