Actually, CLOP spends most of its exploration near the optimum value. Here, "near" specifically means "in the region that has win-rate >= mean." Note that this region is shrinking as the algorithm continues. This is a side effect of the Metropolis-Hastings exploration policy.
It is true that CLOP does not actually measure the function value at the point that it recommends. Before CLOP, I tried a home-grown method based on sampling and UCT, but the convergence was horrible, and the method was useless on more than one or two parameters. Probably there are better implementations, but my conclusion was that fitting quadratic regression curves was much faster. So I agree with Remi about the value of using function approximation to speed optimization along. CLOP improves on what I did by using logistic regression (which is a much better fit to win/loss data) and by making the system work for multiple parameters. My opinion is that CLOP is at its best when optimizing two or three parameters concurrently. Tuning only takes 20K to 30K games to reach a state that is within a few rating points of optimal, so it does not interfere with the rest of your development cycle. Note that you still need altogether different methods of tuning really large parameter sets, such as pattern weights. From: [email protected] [mailto:[email protected]] On Behalf Of Olivier Teytaud Sent: Wednesday, January 04, 2012 10:22 AM To: [email protected] Subject: Re: [Computer-go] win rate bias and CLOP BTW, I imagine that "CLOP" could be any fully automated parameter tuning solution. That is, nothing here is really specific to CLOP. It just happens that CLOP is the first fully automated parameter tuning system that I have made to work. Hi Brian and others; I'm not sure that CLOP could be any fully automated parameter tuning solution. The point is that there are roughly two families of algorithms: 1) those in which exploration is roughly equal to recommendation; we test parameters close to our current approximation of the optimum; 2) those in which exploration is really different from recommendation: we test parameters possibly far from the optimum, because the variance is higher and we get more information there. I assume that CLOP is in category 2. EDA or cross-entropy methods are in category 1. Category 2 is much faster, but assumes some sort of symmetry or regularity, so that what you learn far from the optimum can be used for guessing the optimum. Some people, for philosophical reasons, don't like algorithms of category 2, considering that it is not reasonable to sample by principles like "maximum uncertainty" and regression. I'm not sure, but I think Rémi posted something around that a long time ago - and said that he thinks that usually in games this problem of "symmetry assumption" is not a major trouble... but Rémi knows more than me. In large dimension, I would tend to believe that approach 2 is often much better; as well as when noise is important around the optimum; I mean, on usual cases (it's clear that we can design counter-examples to such a simple rule). For tests I have made, algorithms in category 2 really make a difference. I've done parameter tuning by algorithms of category 1 and, if I do it again one day, I'll try category 2. On artificial test beds, category 2 is much faster on e.g. the noisy objective function f(x) = Bernoulli(c+||x-x*||^2), with c>0, where x* is the optimal parameterization (suboptimality in terms of expected objective function at the parameter chosen after n games decreasing as 1/n instead of 1/sqrt(n), neglecting logarithmic factors). On the other hand, for Bernoulli(||x-x*||^2) (case c=0), both have 1/sqrt(n). As far as I know, the rates above are known (proved) for some specific algorithms, but the fact that algorithms in category 1 can not reach 1/n on f(x)=Bernoulli(c+||x-x*||2) has not been proved. Best regards, Olivier
_______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
