Re: [computer-go] Justification for c==0

Olivier Teytaud Fri, 01 May 2009 10:58:30 -0700

> Theorem: In a finite game tree with no cycles, with binary rewards, the UCT
> algorithm with c==0
> converges (in the absence of computational limitations) to the game
> theoretic optimal policy.
>


This is also tree with RAVE instead of UCT, if you ensure that RAVE values
are never below some positive constant (this was posted in discussions
between Sylvain and me a long time ago in this mailing list, but never
published I guess).

By the way:
- I guess there are several codes in which c=0, or c>0 but with no
significant
  improvement over c=0. If you want to publish scientific papers, claim that
c>0;
  UCT is so widely accepted that your paper is more likely to
  be accepted with c>0   :-)
- even withou parallelization, mogo is better with constant 0.
- on the other hand, and in particular for stochastic games with one player
(which
   is industrially quite relevant), I find UCT quite good. Also, progressive
widening
   as published by Rémi is efficient in this one-player setting (but with
different
   constants). I'm definitely convinced that UCT provides a very great
   improvement, e.g. in front of Bellman's dynamic programming, in many
cases.
- Another point is that with the pattern-based exploration term (not
UCT-like), we
   have an exploration term; but not at all UCT-like (something like
   Heuristic(move)/log(2+numberOfSimulations) ).

A somewhat philosophical point related to the fact that UCT is not so
efficient for Go, is that clear sound algorithms are probably not the main
elements in Go; plenty of parameters (the coefficients of patterns, the
choice of Monte-Carlo player) are necessary and do not follow short and
sound rules. If you want to publish scientific papers,
do not say that - you should preferably hide the details, and show only a
small and well chosen part of your results, so that people can believe that
a small sound idea is the explanation for the fact that the program works.
But, in the particular case of Go,
and somewhat disappointingly, I'm not sure that reality is like that. I
prefer
nonlinear optimization or applications of UCT in one-player games or
applications for this point of view - there, sound ideas usually work and
outperform
dirty tricks :-)

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Justification for c==0

Reply via email to