Re: [Computer-go] win rate bias and CLOP

Brian Sheppard Wed, 04 Jan 2012 15:34:28 -0800

Actually, CLOP spends most of its exploration near the optimum value. Here,
"near" specifically means "in the region that has win-rate >= mean." Note
that this region is shrinking as the algorithm continues. This is a side
effect of the Metropolis-Hastings exploration policy.


 

It is true that CLOP does not actually measure the function value at the
point that it recommends.

 

Before CLOP, I tried a home-grown method based on sampling and UCT, but the
convergence was horrible, and the method was useless on more than one or two
parameters. Probably there are better implementations, but my conclusion was
that fitting quadratic regression curves was much faster. So I agree with
Remi about the value of using function approximation to speed optimization
along.

 

CLOP improves on what I did by using logistic regression (which is a much
better fit to win/loss data) and by making the system work for multiple
parameters.

 

My opinion is that CLOP is at its best when optimizing two or three
parameters concurrently. Tuning only takes 20K to 30K games to reach a state
that is within a few rating points of optimal, so it does not interfere with
the rest of your development cycle.

 

Note that you still need altogether different methods of tuning really large
parameter sets, such as pattern weights.

 

From: [email protected]
[mailto:[email protected]] On Behalf Of Olivier Teytaud
Sent: Wednesday, January 04, 2012 10:22 AM
To: [email protected]
Subject: Re: [Computer-go] win rate bias and CLOP

 


BTW, I imagine that "CLOP" could be any fully automated parameter tuning
solution. That is, nothing here is really specific to CLOP. It just happens
that CLOP is the first fully automated parameter tuning system that I have
made to work.

 

Hi Brian and others;

I'm not sure that CLOP could be any fully automated parameter tuning
solution.

The point is that there are roughly two families of algorithms:

1) those in which exploration is roughly equal to recommendation; we test
parameters close to our current approximation

    of the optimum;

2) those in which exploration is really different from recommendation: we
test parameters possibly far from the optimum,

    because the variance is higher and we get more information there.

I assume that CLOP is in category 2. 

EDA or cross-entropy methods are in category 1. 

Category 2 is much faster, but assumes some sort of symmetry or regularity,
so that what you learn far from the optimum can

be used for guessing the optimum.

Some people, for philosophical reasons, don't like algorithms of category 2,
considering that it is not reasonable to sample by

principles like "maximum uncertainty" and regression. 

I'm not sure, but I think Rémi posted something around that a long time ago
- and said that he thinks that usually in games 

this problem of "symmetry assumption" is not a major trouble... but Rémi
knows more than me. In large dimension, I would tend to believe that
approach 2 is often much better; as well as when noise is important around
the optimum; I mean, on usual cases (it's clear that we can design
counter-examples to such a simple rule).

 

For tests I have made, algorithms in category 2 really make a difference.
I've done parameter tuning by algorithms of category 1 and, if I do it again
one day, I'll try category 2. 

 

On artificial test beds, category 2 is much faster on e.g. the noisy
objective function f(x) = Bernoulli(c+||x-x*||^2),

with c>0, where x*

is the optimal parameterization (suboptimality in terms of expected
objective function at the parameter chosen after n games decreasing as 1/n
instead of 1/sqrt(n), neglecting logarithmic factors). 

On the other hand, for Bernoulli(||x-x*||^2) (case c=0), both have
1/sqrt(n). 

 

As far as I know, the rates above are known (proved) for some specific
algorithms, but the fact that algorithms in category 1

can not reach 1/n on f(x)=Bernoulli(c+||x-x*||2) has not been proved.

 

Best regards,

Olivier

_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Re: [Computer-go] win rate bias and CLOP

Reply via email to