>From what I understand, for each parameter you take with some high >probability the best so far, and with some lower probability the least >tried one. This requires (manually) enumerating all parameters on some >integer scale, if I got it correctly.
Yes, for each parameter you make a range of values that you believe brackets the optimal value. Then the optimizer takes over and (eventually) concentrates attention on the value that works best. For instance, if your program includes a parameter for the number of RAVE wins to initially assign to an atari move, then you might test 0, 1, 2, ... 10. (Note that parameters do not have to be integers. Just an integer number of choices.) Note that a value of 0 sometimes turns off a feature, in effect. When a parameter is turned off I will put the entire code within an if-statement, to approximate the CPU gain from omitting the code entirely: if (theParameter != 0) { evaluation += theParameter * someFunction(); } For the domain of Go, I typically find a broad range of near-optimal values. In fact, there is often insignificant variations over the entire range: perhaps a 1% difference. But I do see 10% gains for certain features, and I never know what will be important. I have tested systems that automatically create new values by interpolation. But I haven't seen any benefit. >Given a set of parameters each with a range of values, and a set of game >results between different parameter sets (or a score average for each >set versus a reference), how to determine: >a) the most likely optimal set of parameters >b) the most interesting parameter set to test next The system above will choose the most interesting parameter set to test next. Now, whether the policy is zero-regret is more complicated. In the case of a single parameter, the method is zero regret. If you want to automatically introduce new values to test, then you can make a zero-regret policy if the objective function is bounded and continuous. But the more interesting case is where you have multiple parameters. In that case, I think that my policy will find the global optimum if the objective function is convex. Convexity is not guaranteed in our domain, but is often the case. That is, the benefit is small if a parameter is either too small or too large, and the optimum is somewhere in between. I could be wrong about some of these statements, because I haven't paused to try to prove anything from a mathematical perspective. Instead, I am convinced by some experimental observation of some pragmatic outcomes: 1) does the optimizer tweak parameters towards optimality? (yes) 2) does the optimizer work without needing my attention? (yes) 3) if I change the program in other ways, will it adapt? (yes) 4) if I code a bad feature, will the optimizer stop using it? (yes) All together, this is not bad value. I should mention one tweak, relating to the fact that your program might sometimes face strong players, and other times weak players. I keep a record of average rating for the opponents faced by each parameter value. Then when I explore I can choose either the least-taken parameter value or the parameter value whose average opponent would be most affected by playing this game. This policy keeps average opponent ratings in line. Brian _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/