>On that topic, I have around 17 flag who enable or not features in my
>pure playouts bots, and I want to search the best combinations of them.
>I known this is almost a dream but does anyone know the best way to
>approximate this.

Pebbles randomly chooses (using a zero asymptotic regret strategy) parameter
values before each game. I literally never manually tune parameters for
Pebbles. I just set up experiments, and put them on a parameter for my
optimizer to manage. After a few hundred games it is clear what the right
choices are.

My favorite exploration strategy is a declining epsilon greedy strategy. I
like it because it is a randomized strategy, so I can optimize all
parameters concurrently using a single stream of games. In this strategy,
one chooses a random number p, and then select the strategy with highest
historical mean if p > epsilon, and the strategy taken least often
otherwise. If epsilon = C*log(n)/n, where n is the number of experiments so
far, then the strategy has zero asymptotic regret.

Pebbles has about 50 parameters right now. Most are pretty settled because
they have thousands of games of experience. All are potentially modified
before each game.

Brian

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to