Re: [computer-go] Optimizing combinations of flags

Don Dailey Wed, 25 Nov 2009 06:02:00 -0800

A few months ago there was a post in the computer chess forums about
optimizing combinations of features.     It was called orthogonal
multi-testing.

Did I mention that on this forum already?   If not, here is a brief on how
it works:

Suppose you have 1 feature you want to test - you might normally play 2
versions of the program a 1,000 game match for instance,  one version with
the feature on and one with the feature off.

However,  if you have 6 features,  there are 64 combinations.    What you
can do is play every combination of feature in a round robin tournament.
For every feature,  half the games played have that feature on, and the
other half have this feature off.   So you can get a separate score for each
feature by considering only the games played between opponents with this
feature ON and OFF.

This idea of course assumes that each feature is independent (there is no
interaction between features.)     This is probably not true but it's
amazing how often it works despite this and we find that methods such as
naive bayes classifiers work surprisingly well even when the attributes of
the thing being classified is not independent.

It reminds me a lot of all-moves-as-first, which assumes the order of the
moves is not relevant but is a wonderful cheat because you multiply your
sample size by an order of magnitude.    So you can play a few thousand
games with a huge number of features and get a huge amount of data by
reusing the same samples for different things.

You could of course just play games where you choose each player randomly.
If you have 256 feature you have a ridiculous number of combinations, more
than you could possibly test but before each test game you just pick a
combination of features randomly for each player.    In this way about 1/4
of the games will be relevant for each feature, regardless of how many
features you are testing.    (1/2 will have the feature on and half of those
games will be against opponents who have the feature off.)

Even if the lack of independence bothers you,  one might use this as an
approximation or a way to get a little closer to the appropriate feature
set,   in the same way all-moves-as-first nicely approximates the value of
much larger samples.

On Tue, Nov 24, 2009 at 11:35 PM, Brian Sheppard <sheppar...@aol.com> wrote:

> >What do you do when you add a new parameter? Do you retain your existing
> >'history', considering each game to have been played with the value of
> >the new parameter set to zero?
>
> Yes, exactly.
>
> >If you have 50 parameters already, doesn't adding a new parameter create
> >a rather large number of new parameter sets, most of which there will
> >never be time to investigate?
>
> Yes. So the new parameter will drift to its optimal value against the
> existing parameter values.
>
> But here's the thing: declining epsilon greedy policies are zero regret
> starting from any initial state. So if the setting of the new parameter
> affects old parameter settings, then the established parameters will start
> to move as well.
>
> If the objective function is a convex function of the parameters (which is
> generally the case, based on the curves that I have seen) then the whole
> system will drift to a global optimum.
>
> >I have been using UCB and UCT to tune engine settings, but I don't think
> >these methods work well to tune more than a handful of parameters at the
> >same time.
>
> Such systems have trouble because their exploration is a *deterministic*
> function of the sequence of wins. That is, all parameters will lock into
> the
> same set of feedback. If you use UCT, then you have to optimize
> *combinations* of parameters, which is unwieldy.
>
> Declining epsilon greedy is a randomized exploration strategy, but still
> zero-regret. Now the same sequence of wins/losses can be used to tune all
> parameters concurrently.
>
>
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Optimizing combinations of flags

Reply via email to