[computer-go] Optimizing combinations of flags

Brian Sheppard Tue, 24 Nov 2009 20:36:46 -0800

>What do you do when you add a new parameter? Do you retain your existing
>'history', considering each game to have been played with the value of
>the new parameter set to zero?


Yes, exactly.

>If you have 50 parameters already, doesn't adding a new parameter create
>a rather large number of new parameter sets, most of which there will
>never be time to investigate?

Yes. So the new parameter will drift to its optimal value against the
existing parameter values.

But here's the thing: declining epsilon greedy policies are zero regret
starting from any initial state. So if the setting of the new parameter
affects old parameter settings, then the established parameters will start
to move as well.

If the objective function is a convex function of the parameters (which is
generally the case, based on the curves that I have seen) then the whole
system will drift to a global optimum.

>I have been using UCB and UCT to tune engine settings, but I don't think
>these methods work well to tune more than a handful of parameters at the
>same time.

Such systems have trouble because their exploration is a *deterministic*
function of the sequence of wins. That is, all parameters will lock into the
same set of feedback. If you use UCT, then you have to optimize
*combinations* of parameters, which is unwieldy.

Declining epsilon greedy is a randomized exploration strategy, but still
zero-regret. Now the same sequence of wins/losses can be used to tune all
parameters concurrently.


_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] Optimizing combinations of flags

Reply via email to