Re: [Math] Old to new API ("MultivariateDifferentiable(Vector)Function")

Konstantin Berlin Sat, 01 Dec 2012 12:59:17 -0800

I forgot to say that there are commonly used benchmarks for optimization 
algorithm developers. They are commonly used to compare different algorithms in 
publications. I am personally not familiar with them, but it would be easy to 
google them.


On Dec 1, 2012, at 1:31 PM, Gilles Sadowski <gil...@harfang.homelinux.org> 
wrote:

> Hello.
> 
>> 
>> Now that I have some time, let me try to make my case clearly. First I want 
>> to say that this is not some attack on the automatic-differentation package. 
>> I love automatic-differentation and symbolic packages. I personally cannot 
>> compute a derivative without a computer for the life of me. And I am also 
>> glad we are finally starting to agree on something :)
> 
> I don't think that we disagreed about the fact that there was a problem with
> the interface to the user application. [_I_ started this thread hinting that
> there was a problem (at least for me as a user, based on my use-case).]
> 
> [Then there were several misunderstandings about what was the problem and how
> to solve it...]
> 
>> 
>> This discussion is about figuring out how an incorrect framework and storage 
>> affects the performance of an optimization. That is why I am so worried.
> 
> I can understand that this is an important point for your use-case.
> There are now two vantage points (at least) on two aspects to consider: You
> seem to have experience where storage matter (while I don't); you worry
> about "small" objects allocation (while I don't, anymore).
> 
> I'm not saying that your worries do not count; quite the contrary. CM is not
> my "toolbox"; it's a library based on the knowledge of its developers
> (obviously).
> If you bring actual use-cases (i.e. unit tests or real application code
> that can be benchmarked) that will show worrying behaviour, it will
> certainly trigger swift corrective action. [This has happened recently.]
> 
> In this area (performance), the problem is that intuition (or educated
> guess, however you want to name it) is not a substitute for actual runs,
> sometimes by a large amount. [When I started to use CM, I raised the issue
> that a 3D vector was immutable (so that I had to create a new one whenever
> its coordinates were changing). Surely this was a performance hit! Actually
> it wasn't.]
> 
> Again, that does not mean that you are wrong in this case. It's just that we
> cannot jump and change the code every time someone comes up with what
> amounts to "conventional wisdom". If the person comes with a patch that
> readily proves the point (at least marginally through microbenchmarking),
> then we all benefit.
> 
>> 
>> So lets start with the basics. A Newton method must compute a descent step 
>> by solving a linear equation
>> 
>> H*p = -g, (1)
>> 
>> where H is the Hessian, g is the gradient, and p is the desired descent 
>> step. What I am about to say also holds for non-linear least squares method, 
>> where Hessian is approximated using the Jacobian as
>> 
>> H \approx J^T*J+\lambda I.
>> 
>> Now, if you are not solving a simple optimization problem that you keep 
>> giving examples for, you can easily have a Hessian be a very large matrix,
>> like 1000x1000 or larger. Now, you better hope that you are storing H using 
>> your linear algebra framework, otherwise eq. (1) computation is going to 
>> take a while.
>> This is actually a very active area of research, and that is why having 
>> sparse linear algebra (aren't you removing this? ;) ) and iterative solvers 
>> is important to a lot of people.
> 
> Yes we are removing it because it is buggy and _nobody_ stepped up to say
> that it was important for CM users, and to help solve the problems in a way
> consistent with real-life usage of such a feature.
> As Sébastien said, you are warmly welcome to help us find a way to keep the
> feature.
> 
>> 
>> What you are proposing as good OO style 
> 
> This discussion has really nothing to do with "OO style", merely code reuse;
> and it was my big misunderstanding (partly because of my lack of knowledge,
> partly because of the lack of documentation on the targetted usages of
> "DerivativeStructure", which IIUC now, are outside CM) that I believed
> that the new "MultivariateDifferentiableFunction" was the way to go.
> 
> [Also, Luc had been moving very fast on this. And I couldn't keep up
> although I had wondered earlier why this had to impact usage in the
> "optimization" package.]
> 
>> is that if someone has a function that they want to optimize, they need to 
>> take what is probably already a RealMatrix or [][], and create 1000x1000 
>> DerivativeStructure objects, that,
> 
> IIUC, that would be 1000 "DerivativeStructure" (not 1000x1000). If so, for
> this example, using "DerivativeStructure" would lead to about a 4% increase
> in storage memory.
> 
>> within the next 10 lines in the optimizer, are going to be converted back to 
>> a RealMatrix? Not only have you just fragmented your heap space, but your 
>> garbage collector
>> is going crazy, and you are wasting an incredible amount of memory.
> 
> This is the kind of assertions that really needs support from actual code.
> [Again, I don't claim to know better; I think that it would really be useful
> to accumulate a list of "do and don't" for Java and CM, in the form of
> reproducible user experience.]
> 
>> Now imagine if your Jacobian is actually very simple to compute but large, 
>> then this whole conversion back and forth actually takes much longer than 
>> the function evaluation.
> 
> We are willing to take this into account, but since I cannot see the
> behaviour in my use-case (where the evaluation time overwhelms, by several
> orders of magnitude, all such considerations), I do not have any incentive
> to change what seems to be "efficient enough" (for the actually known
> cases).
> Again, your experience with other use-cases would be very valuable to
> analyze the efficiency of the CM implementations and improve them, if
> necessary.
> 
>> 
>> So, why exactly are you insisting on taking this performance penalty?
> 
> It's the other way around. Until the performance penalty is proven, we've
> decided that it's up to the developer willing to do the work to decide.
> Again, if there is patch, it makes the matter much easier to decide on.
> 
> I admit that we decided to change the internal working of the optimizers, so
> _we_ should have proved that it does not impact usability. Hence, again, the
> usefulness of having a test suite of "real" applications which could be
> benchmarked regularly in order to have performance regressions induced by
> otherwise welcome changes.
> [I raised that issue several times on the ML.]
> 
>> You claim that the optimizer can work better if it gets a 
>> DerivativeStructure, why?
> 
> This was (supposed to be) an improvement of the _internal_ design. [Several
> weeks ago, Luc posted the problems he was encountering.]
> 
>> Where is the fact that you are holding a DerivativeStructure come into 
>> effect for a Newton method? Can you provide any literature in that regard? 
>> The classical optimization bottleneck, if not the actual function 
>> evaluation, is eq. (1). The optimization methodology is build on top of 
>> years of research in computational linear algebra concepts, and does not 
>> care how the matrices are actually computed. Efficient memory usage and 
>> speed is critical here because in Newton optimizations eq. (1) is evaluated 
>> thousands of times. The goal of the Jacobian is not to store derivatives it 
>> is to store a matrix of numbers that allows you to quadratically approximate 
>> your target function.
> 
> OK. Then Luc's proposal seems appropriate.
> There would be new interfaces defining the "optimize" method appropriately.
> For algorithms that need the gradient, it must be provided in an additional
> argument, of type "MultivariateVectorFunction"; for those that need, the
> Jacobian, the argument would be of type "MultivariateMatrixFunction".
> Do you agree?
> 
>> 
>> You guys are focusing on people using finite differences and trying to 
>> optimize a newton method to use finite differences. This is not what the 
>> main purpose of a newton method is. If you want a method that dynamically 
>> adjusts finite differences step size, you should switch to BOBYQA, which was 
>> designed for this case.
> 
> Another misunderstanding probably. [The "stepSize" discussion was sort of
> a digression.]
> 
>> 
>> Derivatives can be computed by a computer using a symbolic toolbox a priori 
>> (something that I was referring to when I accidentally said 
>> auto-differenation). They can also be efficiently simplified by that toolbox 
>> before being pasted back into you program. Auto-diff could also be an 
>> important tool for people if their problem is simple or they are not 
>> concerned with optimal efficiency . This can easily be handled by a wrapper 
>> and has nothing to do with Newton optimization.
> 
> Maybe we should also change the "NewtonSolver" (in package
> "o.a.c.m.analysis.solvers"). [In the same way we'll do for the optimizer
> with derivatives. Actually those two packages were improved in parallel so
> that the interface would be very similar from a usage point of view.]
> 
>> If you want to talk about proper OO design and internal simplification then 
>> you should focus on being able to pass a linear solver into your 
>> optimization method. This way you allow people to use iterative and sparse 
>> solvers when needed.
> 
> This is a new feature; contributions are welcome. [CM is primarily
> developed by people who use (part of) it; if they don't need that feature,
> or don't know how to implement it, or do not have the time to implement it,
> it won't be implemented.]
> 
>> If you are concerned about people getting derivatives wrong, you can do what 
>> MATLAB does, which is to add an option to check the derivatives using finite 
>> differences when debugging. 
> 
> IMHO, the top-priority feature to help in debugging is to introduce logging
> statments!
> 
>> 
>> This brings me to my second issue. There are important problems where 
>> computation
>> of the derivatives combined with the actual function evaluation is 
>> *significantly*
>> faster than computing each alone. I am not clear why I am hitting such 
>> resistance
>> with this. [...]
> 
> [If you think that you were met with strong resistance, I'd suggest that you
> look in the archives for the discussions about exceptions in CM...]
> 
> Unless I'm mistaken, CM does not prevent you to write a class that takes
> advantage of combining the computations.
> 
>> What I suggested as function interfaces was just an initial quick suggestion
>> that I would be happy for anyone to change in a way that everyone feels 
>> positive
>> about. I just don't want my second issue to be ignored like a non-issue.
> 
> I'm getting confused. Could you please restate what where those first and
> second issue?
> 
> Thanks,
> Gilles
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [Math] Old to new API ("MultivariateDifferentiable(Vector)Function")

Reply via email to