I respond to various items below. Sections of the original e-mail that I'm not responding to were completely deleted.

Jacques BasaldĂșa wrote:
Hello Jason

I think what you are trying to do can be done more easily.

I guess the key question is "what am I trying to do?".

In UCT, the next move to simulate is chosen based off of an estimated probability of winning. Correcting bias in that estimate should lead to better sampling.

If one abruptly stops all simulations and picks the "best" move based on this estimated probability, I think this may give the optimal answer... choosing the move with the highest expectation of winning the game. It's important to note that with a peaked distribution of moves' probabilities of winning, the estimated probability of winning will rise slowly. That means that moves with only a few simulations will never be chosen over moves with (a good win rate and) a lot of simulations.

C. To compare if a move is better than another, you have
to compare _confidence intervals_. I.e. the interval in which
p (the unknown probability) lies computed from your
observed p-hat, n and a desired confidence level, say 95%.
These intervals can be computed with methods you can
find searching for "Confidence Interval for a Binomial
Proportion". The most used are Wilson and Agresti-Coull
intervals. These intervals include continuity correction
as you mention in your post. Other ways of comparing
proportions are: The difference between proportions, the
relative risk and the odds ratio to name a few. My "Bible"
for this is a book called "Categorical Data Analysis" from
Alan Agresti published by Wiley & Sons.

With lots and lots of simulations, this could lead to a prediction such as move a is better than move b with 95% confidence. If a bot wants to prove with high confidence that the move that it has selected is better than all others, I suspect it may have to do lots and lots of simulations and would be impractical. I think that was the same point you made later in your e-mail. I welcome someone proving us wrong ;)

An alternate but related approach is "move a is better than move b with a p-value of xx%". Of course, I'm also not too sure on how to use that result.

> To use these results, you must make some assumption
> about the underlying distribution of a move's probability
> of winning.

That's the good news. You don't. There is no need to
understand what complex mechanism produces p. Only
that: same position == same p.

If you take a good look at your tests, they will make very specific null hypothesis which in effect make at least some assumption about the underlying distributions (or try to wash away all effects with the central limit theorem).


*Do not* expect a sound statistical analysis to tell you
the best move, unless it is very obvious, n is immense or
your confidence level is extremely low. But, if you are
lucky, it will tell you what moves are clearly bad and can
be safely (= with a given confidence) pruned out.


I agree with that.
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to