>On that topic, I have around 17 flag who enable or not features in my >pure playouts bots, and I want to search the best combinations of them. >I known this is almost a dream but does anyone know the best way to >approximate this.
Pebbles randomly chooses (using a zero asymptotic regret strategy) parameter values before each game. I literally never manually tune parameters for Pebbles. I just set up experiments, and put them on a parameter for my optimizer to manage. After a few hundred games it is clear what the right choices are. My favorite exploration strategy is a declining epsilon greedy strategy. I like it because it is a randomized strategy, so I can optimize all parameters concurrently using a single stream of games. In this strategy, one chooses a random number p, and then select the strategy with highest historical mean if p > epsilon, and the strategy taken least often otherwise. If epsilon = C*log(n)/n, where n is the number of experiments so far, then the strategy has zero asymptotic regret. Pebbles has about 50 parameters right now. Most are pretty settled because they have thousands of games of experience. All are potentially modified before each game. Brian _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/