On 4/6/07, Don Dailey <[EMAIL PROTECTED]> wrote:
On Fri, 2007-04-06 at 12:43 -0400, [EMAIL PROTECTED] wrote:
> Alpha/Beta cutoffs only make sense when calling the evaluation
> function twice on the exact same position can be guaranteed to
> provide
> the exact same value. This is obviously not the case for MC
> evaluation, hence the success of UCT.
I don't know if any of this is true.
Noticed the subtle "can be" in my statement above?
I'm quite aware of all the hairy details (e.g., strictly speaking the
use of a transposition table already voids the above premise).
You can apply alpha beta
cutoffs whether the evaluation function is deterministic or not.
I know quite well how alpha-beta is used in practice, but that's not
the point I tried to get across. The cool thing of alpha-beta cutoffs
is that, under certain conditions, it is guaranteed to preserve the
minimax result without searching the full tree. However, if the
evaluation function is non-deterministic that guarantee simply doesn't
make sense any more. I'm not saying that therefore alpha-beta can't be
used; in fact I'll be the first to admit that you can apply it to
*any* domain with a finite number of known legal actions. Not that
that's a good plan, but for sure you will always get something out. In
many games it will even play quite well. In practice pathology in
search is known to be rare, however, my impression is that in Go it
may be more relevant than in other games, such as chess, where, e.g.,
noise actually helps as a weak mobility component. UCT was
specifically designed to deal with uncertainty, so that's why I think
it's better suited for highly uncertain evaluations .
UCT calls the "random play-out" only once on any given position
and there is no reason in principle that it couldn't be a
deterministic evaluation function instead.
Sure, and you could even argue that the "random play-out" *is*
deterministic. Fact remains that the uncertainty/variance of one
playout is in most cases still huge, and the tree-search has to deal
with this (or in Chrilly's words, filter it). Unless the fake mobility
is helping significantly, this could be a severe disadvantage for
alpha-beta type searchers compared to UCT.
To me, a true selective program cuts of a line forever.
I would call that a greedy algorithm (because it never reconsiders
previous decisions).
I have this idea that perhaps a good evaluation function could
replace the play-out portion of the UCT programs.
Agreed.
It only comes down to whether the inherent randomness is critical
to the success of UCT or not. If it isn't, then a play-out is
nothing more than just an evaluation function.
My guess is that the answer which type of search works best for a
given evaluation function depends on the amounts of (deterministic)
bias and (probabilistic) uncertainty in the evaluations (and so far I
see MC mainly as an extremely noisy evaluation with not too much
systematic bias).
Erik
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/