I forgot to say that there are commonly used benchmarks for optimization algorithm developers. They are commonly used to compare different algorithms in publications. I am personally not familiar with them, but it would be easy to google them.
On Dec 1, 2012, at 1:31 PM, Gilles Sadowski <gil...@harfang.homelinux.org> wrote: > Hello. > >> >> Now that I have some time, let me try to make my case clearly. First I want >> to say that this is not some attack on the automatic-differentation package. >> I love automatic-differentation and symbolic packages. I personally cannot >> compute a derivative without a computer for the life of me. And I am also >> glad we are finally starting to agree on something :) > > I don't think that we disagreed about the fact that there was a problem with > the interface to the user application. [_I_ started this thread hinting that > there was a problem (at least for me as a user, based on my use-case).] > > [Then there were several misunderstandings about what was the problem and how > to solve it...] > >> >> This discussion is about figuring out how an incorrect framework and storage >> affects the performance of an optimization. That is why I am so worried. > > I can understand that this is an important point for your use-case. > There are now two vantage points (at least) on two aspects to consider: You > seem to have experience where storage matter (while I don't); you worry > about "small" objects allocation (while I don't, anymore). > > I'm not saying that your worries do not count; quite the contrary. CM is not > my "toolbox"; it's a library based on the knowledge of its developers > (obviously). > If you bring actual use-cases (i.e. unit tests or real application code > that can be benchmarked) that will show worrying behaviour, it will > certainly trigger swift corrective action. [This has happened recently.] > > In this area (performance), the problem is that intuition (or educated > guess, however you want to name it) is not a substitute for actual runs, > sometimes by a large amount. [When I started to use CM, I raised the issue > that a 3D vector was immutable (so that I had to create a new one whenever > its coordinates were changing). Surely this was a performance hit! Actually > it wasn't.] > > Again, that does not mean that you are wrong in this case. It's just that we > cannot jump and change the code every time someone comes up with what > amounts to "conventional wisdom". If the person comes with a patch that > readily proves the point (at least marginally through microbenchmarking), > then we all benefit. > >> >> So lets start with the basics. A Newton method must compute a descent step >> by solving a linear equation >> >> H*p = -g, (1) >> >> where H is the Hessian, g is the gradient, and p is the desired descent >> step. What I am about to say also holds for non-linear least squares method, >> where Hessian is approximated using the Jacobian as >> >> H \approx J^T*J+\lambda I. >> >> Now, if you are not solving a simple optimization problem that you keep >> giving examples for, you can easily have a Hessian be a very large matrix, >> like 1000x1000 or larger. Now, you better hope that you are storing H using >> your linear algebra framework, otherwise eq. (1) computation is going to >> take a while. >> This is actually a very active area of research, and that is why having >> sparse linear algebra (aren't you removing this? ;) ) and iterative solvers >> is important to a lot of people. > > Yes we are removing it because it is buggy and _nobody_ stepped up to say > that it was important for CM users, and to help solve the problems in a way > consistent with real-life usage of such a feature. > As Sébastien said, you are warmly welcome to help us find a way to keep the > feature. > >> >> What you are proposing as good OO style > > This discussion has really nothing to do with "OO style", merely code reuse; > and it was my big misunderstanding (partly because of my lack of knowledge, > partly because of the lack of documentation on the targetted usages of > "DerivativeStructure", which IIUC now, are outside CM) that I believed > that the new "MultivariateDifferentiableFunction" was the way to go. > > [Also, Luc had been moving very fast on this. And I couldn't keep up > although I had wondered earlier why this had to impact usage in the > "optimization" package.] > >> is that if someone has a function that they want to optimize, they need to >> take what is probably already a RealMatrix or [][], and create 1000x1000 >> DerivativeStructure objects, that, > > IIUC, that would be 1000 "DerivativeStructure" (not 1000x1000). If so, for > this example, using "DerivativeStructure" would lead to about a 4% increase > in storage memory. > >> within the next 10 lines in the optimizer, are going to be converted back to >> a RealMatrix? Not only have you just fragmented your heap space, but your >> garbage collector >> is going crazy, and you are wasting an incredible amount of memory. > > This is the kind of assertions that really needs support from actual code. > [Again, I don't claim to know better; I think that it would really be useful > to accumulate a list of "do and don't" for Java and CM, in the form of > reproducible user experience.] > >> Now imagine if your Jacobian is actually very simple to compute but large, >> then this whole conversion back and forth actually takes much longer than >> the function evaluation. > > We are willing to take this into account, but since I cannot see the > behaviour in my use-case (where the evaluation time overwhelms, by several > orders of magnitude, all such considerations), I do not have any incentive > to change what seems to be "efficient enough" (for the actually known > cases). > Again, your experience with other use-cases would be very valuable to > analyze the efficiency of the CM implementations and improve them, if > necessary. > >> >> So, why exactly are you insisting on taking this performance penalty? > > It's the other way around. Until the performance penalty is proven, we've > decided that it's up to the developer willing to do the work to decide. > Again, if there is patch, it makes the matter much easier to decide on. > > I admit that we decided to change the internal working of the optimizers, so > _we_ should have proved that it does not impact usability. Hence, again, the > usefulness of having a test suite of "real" applications which could be > benchmarked regularly in order to have performance regressions induced by > otherwise welcome changes. > [I raised that issue several times on the ML.] > >> You claim that the optimizer can work better if it gets a >> DerivativeStructure, why? > > This was (supposed to be) an improvement of the _internal_ design. [Several > weeks ago, Luc posted the problems he was encountering.] > >> Where is the fact that you are holding a DerivativeStructure come into >> effect for a Newton method? Can you provide any literature in that regard? >> The classical optimization bottleneck, if not the actual function >> evaluation, is eq. (1). The optimization methodology is build on top of >> years of research in computational linear algebra concepts, and does not >> care how the matrices are actually computed. Efficient memory usage and >> speed is critical here because in Newton optimizations eq. (1) is evaluated >> thousands of times. The goal of the Jacobian is not to store derivatives it >> is to store a matrix of numbers that allows you to quadratically approximate >> your target function. > > OK. Then Luc's proposal seems appropriate. > There would be new interfaces defining the "optimize" method appropriately. > For algorithms that need the gradient, it must be provided in an additional > argument, of type "MultivariateVectorFunction"; for those that need, the > Jacobian, the argument would be of type "MultivariateMatrixFunction". > Do you agree? > >> >> You guys are focusing on people using finite differences and trying to >> optimize a newton method to use finite differences. This is not what the >> main purpose of a newton method is. If you want a method that dynamically >> adjusts finite differences step size, you should switch to BOBYQA, which was >> designed for this case. > > Another misunderstanding probably. [The "stepSize" discussion was sort of > a digression.] > >> >> Derivatives can be computed by a computer using a symbolic toolbox a priori >> (something that I was referring to when I accidentally said >> auto-differenation). They can also be efficiently simplified by that toolbox >> before being pasted back into you program. Auto-diff could also be an >> important tool for people if their problem is simple or they are not >> concerned with optimal efficiency . This can easily be handled by a wrapper >> and has nothing to do with Newton optimization. > > Maybe we should also change the "NewtonSolver" (in package > "o.a.c.m.analysis.solvers"). [In the same way we'll do for the optimizer > with derivatives. Actually those two packages were improved in parallel so > that the interface would be very similar from a usage point of view.] > >> If you want to talk about proper OO design and internal simplification then >> you should focus on being able to pass a linear solver into your >> optimization method. This way you allow people to use iterative and sparse >> solvers when needed. > > This is a new feature; contributions are welcome. [CM is primarily > developed by people who use (part of) it; if they don't need that feature, > or don't know how to implement it, or do not have the time to implement it, > it won't be implemented.] > >> If you are concerned about people getting derivatives wrong, you can do what >> MATLAB does, which is to add an option to check the derivatives using finite >> differences when debugging. > > IMHO, the top-priority feature to help in debugging is to introduce logging > statments! > >> >> This brings me to my second issue. There are important problems where >> computation >> of the derivatives combined with the actual function evaluation is >> *significantly* >> faster than computing each alone. I am not clear why I am hitting such >> resistance >> with this. [...] > > [If you think that you were met with strong resistance, I'd suggest that you > look in the archives for the discussions about exceptions in CM...] > > Unless I'm mistaken, CM does not prevent you to write a class that takes > advantage of combining the computations. > >> What I suggested as function interfaces was just an initial quick suggestion >> that I would be happy for anyone to change in a way that everyone feels >> positive >> about. I just don't want my second issue to be ignored like a non-issue. > > I'm getting confused. Could you please restate what where those first and > second issue? > > Thanks, > Gilles > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org