> > Well, empirically, when I set the exploration component to zero it starts > to play a lot worse. Like I wrote: the winning percentage drops to 24% vs. > the same program with the exploration component, which is a huge difference. > > So if you have a different experience, you must have something else that > overcomes this hurdle that's not part of a simple MCTS-RAVE implementation. > I'd be very interested to learn what that is. Sylvain didn't take the bait > ;-) >
Here, we have a non-zero initialization of the number of wins, of the numbere of simulations, of the number of Rave-wins, of the number of Rave-losses. We have then a 0 constant for exploration, but also an exploratory term which is very different, and for which I am not the main author - therefore I let the main author give an explanation if he wants to :-) I point out that even before this exploratory term, the best UCB-like exploration-constant was 0 - as soon as the initializations of numbers of wins, of losses, of Rave-wins, of Rave-losses are heuristic values.
_______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/