On Jan 21, 2009, at 11:53 AM, Olivier Teytaud wrote:
Here, we have a non-zero initialization of the number of wins, of
the numbere of simulations, of the number of Rave-wins, of the
number of Rave-losses.
We have then a 0 constant for exploration, but also an exploratory
term which is very different, and for which I am not the main author
- therefore I let the main author
give an explanation if he wants to :-)
I point out that even before this exploratory term, the best UCB-
like exploration-constant was 0 - as soon as the initializations of
numbers of wins, of losses, of Rave-wins, of Rave-losses are
heuristic values.
I'd like to make sure I understand what you mean exactly. You use some
heuristics to intialize all the moves (or maybe some of the moves)
with a certain win-loss and rave-win-loss ratios?
To a certain extent I suppose these could come from the reading of the
previous move? I think I slowly start to make sense of things...
Mark
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/