Hi,
2012/12/1 Konstantin Berlin <kber...@gmail.com> > Hi, > > Now that I have some time, let me try to make my case clearly. First I > want to say that this is not some attack on the automatic-differentation > package. I love automatic-differentation and symbolic packages. I > personally cannot compute a derivative without a computer for the life of > me. And I am also glad we are finally starting to agree on something :) > > This discussion is about figuring out how an incorrect framework and > storage affects the performance of an optimization. That is why I am so > worried. > > So lets start with the basics. A Newton method must compute a descent step > by solving a linear equation > > H*p = -g, (1) > > where H is the Hessian, g is the gradient, and p is the desired descent > step. What I am about to say also holds for non-linear least squares > method, where Hessian is approximated using the Jacobian as > > H \approx J^T*J+\lambda I. > > Now, if you are not solving a simple optimization problem that you keep > giving examples for, you can easily have a Hessian be a very large matrix, > like 1000x1000 or larger. Now, you better hope that you are storing H > using your linear algebra framework, otherwise eq. (1) computation is going > to take a while. > This is actually a very active area of research, and that is why having > sparse linear algebra (aren't you removing this? ;) ) and iterative solvers > is important to a lot of people. > A few months ago, we started a thread on this issue (on the users ML). It got no answers! I am personally not happy with removing the support for sparse vectors/matrices, but the truth is we didn't see a way to achieve the same degree of correctness as --say-- array based matrices and vectors. As far as I am concerned, though, this is still an open question, and if you have ideas about these issues, we would be glad to here them! > > What you are proposing as good OO style is that if someone has a function > that they want to optimize, they need to take what is probably already a > RealMatrix or [][], and create 1000x1000 DerivativeStructure objects, that, > within the next 10 lines in the optimizer, are going to be converted back > to a RealMatrix? Not only have you just fragmented your heap space, but > your garbage collector > is going crazy, and you are wasting an incredible amount of memory. Now > imagine if your Jacobian is actually very simple to compute but large, then > this whole conversion back and forth actually takes much longer than the > function evaluation. > > So, why exactly are you insisting on taking this performance penalty? You > claim that the optimizer can work better if it gets a DerivativeStructure, > why? Where is the fact that you are holding a DerivativeStructure come into > effect for a Newton method? Can you provide any literature in that regard? > The classical optimization bottleneck, if not the actual function > evaluation, is eq. (1). The optimization methodology is build on top of > years of research in computational linear algebra concepts, and does not > care how the matrices are actually computed. Efficient memory usage and > speed is critical here because in Newton optimizations eq. (1) is evaluated > thousands of times. The goal of the Jacobian is not to store derivatives it > is to store a matrix of numbers that allows you to quadratically > approximate your target function. > > You guys are focusing on people using finite differences and trying to > optimize a newton method to use finite differences. This is not what the > main purpose of a newton method is. If you want a method that dynamically > adjusts finite differences step size, you should switch to BOBYQA, which > was designed for this case. > > Derivatives can be computed by a computer using a symbolic toolbox a > priori (something that I was referring to when I accidentally said > auto-differenation). They can also be efficiently simplified by that > toolbox before being pasted back into you program. Auto-diff could also be > an important tool for people if their problem is simple or they are not > concerned with optimal efficiency . This can easily be handled by a wrapper > and has nothing to do with Newton optimization. If you want to talk about > proper OO design and internal simplification then you should focus on being > able to pass a linear solver into your optimization method. This way you > allow people to use iterative and sparse solvers when needed. If you are > concerned about people getting derivatives wrong, you can do what MATLAB > does, which is to add an option to check the derivatives using finite > differences when debugging. > > This brings me to my second issue. There are important problems where > computation of the derivatives combined with the actual function evaluation > is *significantly* faster than computing each alone. I am not clear why I > am hitting such resistance with this. For example, you are doing some sort > of a simulation, which can be trivially adjusted in the end to give a > derivative or a function value. A very real example of this is a Fast > Multipole Method, which takes time to compute a series expansion of the > function, but the resulting series expansion can be analytically > differentiated. > > What I suggested as function interfaces was just an initial quick > suggestion that I would be happy for anyone to change in a way that > everyone feels positive about. I just don't want my second issue to be > ignored like a non-issue. > > This is certainly not a non-issue. Thanks for *your* time, Sébastien > Thanks for your time, > Konstantin > > On Nov 30, 2012, at 5:15 PM, Gilles Sadowski <gil...@harfang.homelinux.org> > wrote: > > >>> > >>> I don't know if people are confused about auto-differentation, I > >>> think most people working in numerical analysis are very well aware > >>> of what it does. The issue here is that it is a completely separate > >>> subject from optimizations. In a proper OO design you would not mix > >>> the two together. Computation of the derivatives is a separate > >>> component from optimization. Optimization only assumes that you can > >>> compute one, it shouldn't care or force you two compute one in a > >>> specific way. That's bad design. Everyone component should only have > >>> necessary and sufficient requirements for use. Automatic > >>> differentiation is not necessary for optimization so it should not > >>> exist in the interface. > >> > >> You are right. > >> > >>> > >>> I would like to add, that if my function is simple enough, I am very > >>> capable of running autdiff package a priori myself. The real need for > >>> finite-difference is when you cannot analytically compute the > >>> derivative. Though I think having a wrapper function around autdiff > >>> package is a nice tool to help developers quickly write code, if they > >>> prefer to have quicker development time but slightly slower code. > >> > >>> > >>> If someone can explain to me why auto-diff package should be directly > >>> forced or even be provided as alternative into optimization package > >>> when a simple wrapper function can do this task while maintaining > >>> much cleaner OO design, please tell me. > >> > >> You just found the reason and explained it by yourself: it was bad > >> design and I am the one to blame for this. I was (and still am) stupid. > > > > I'm not convinced that it was bad design. Maybe I'm completely off now: > > I thought that the purpose was internal simplification. I don't think > > that it can be inherently wrong to store any sort of derivatives (be it > > gradient or Jacobian) into a structure aimed at storing... partial > > derivatives (i.e. "DerivativeStructure"). > > > > Please let me know whether the problem is more in-depth than what I > > described previously (allowing the current way to provide the gradient > > and Jacobian). > > > > I would nevertheless have suggested to delete the method > > MultivariateFunction partialDerivative(int k); > > from "DifferentiableMultivariateFunction". > > > >> > >> So I suggest we disconnect differentiation from optimization, but in a > >> way that would let users decide how they provide the differentials. This > >> means I would not like to reintroduce the former interfaces. > >> > >> What about having the optimize() methods taking two arguments function, > >> a MultivariateFunction for the value and a MultivariateVectorFunction > >> for the gradient? It would be up to the user to select how they compute > >> each function. They could use direct coding, they could use a finite > >> differences wrapper around the first function to implement the second, > > > > > > > >> they could use our differentiation framework and put one wrapper to > >> extract the value and another one to extract the gradient. > > > > That would also be a nice example of usage of the differentiation > > framework. > > > >> What do you think? > > > > Yes. Now I got it (I think). IIUC, we could even introduce the new > interface > > before 3.1, and deprecated old the older ones (as you wanted to do > anyways, > > IIRC). > > > > I'll give it a try in the next days. OK? > > > > > > Gilles > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > >