Hi,

I don't know if people are confused about auto-differentation, I think most 
people working in numerical analysis are very well aware of what it does. The 
issue here is that it is a completely separate subject from optimizations. In a 
proper OO design you would not mix the two together. Computation of the 
derivatives is a separate component from optimization. Optimization only 
assumes that you can compute one, it shouldn't care or force you two compute 
one in a specific way. That's bad design. Everyone component should only have 
necessary and sufficient requirements for use. Automatic differentiation is not 
necessary for optimization so it should not exist in the interface.

I would like to add, that if my function is simple enough, I am very capable of 
running autdiff package a priori myself. The real need for finite-difference is 
when you cannot analytically compute the derivative. Though I think having a 
wrapper function around autdiff package is a nice tool to help developers 
quickly write code, if they prefer to have quicker development time but 
slightly slower code.

If someone can explain to me why auto-diff package should be directly forced or 
even be provided as alternative into optimization package when a simple wrapper 
function can do this task while maintaining much cleaner OO design, please tell 
me.

Thanks,
Konstantin

On Nov 30, 2012, at 2:06 PM, Gilles Sadowski <gil...@harfang.homelinux.org> 
wrote:

> Hi Luc.
> 
>>> As a user of the optimization algorithms I am completely confused by
>>> the change. It seems different from how optimization function are
>>> typically used and seems to be creating a barrier for no reason.
>> 
>> The reason is that the framework has been done for several uses, not
>> only optimization.
> 
> Let's admit that the "autodiff" subject is not yet well-known, and viewed
> from the CM "optimization" package it looks very complex: some examples
> would be welcome in the user guide.
> The "DerivativeStructure" itself is not easily grasped (e.g. there are
> many constructors and one does not know which are "user-oriented" and which
> are more linked to the internal structure ("DSCompiler" which is even more
> cryptic at first glance).
> 
> Of course, _I_ just have to start reading about the subject in order to
> understand; you are not expected to provide the background within the
> Javadoc! :-)
> 
>>> 
>>> I am not clear why you can't just leave the standard interface to an
>>> optimizer be a function that computes the value and the Jacobian (in
>>> case of least-squares), the gradient (for quasi-Newton methods) and
>>> if you actually have a full newton method, also the Hessian.
>>> 
>>> If the user wants to compute the Jacobian (gradient) using finite
>>> differences they can do it themselves, or wrap it into a class that
>>> you can provide them that will compute finite differences using the
>>> desired algorithm.
>> 
>> This is already what many people do, and it can be done with both the
>> older and the newer API. Nothing prevents users to use finite
>> differences in the objects they pass to the optimizer.
>> 
>>> 
>>> Also I can image a case when computation of the Jacobian can be sped
>>> up if the function value is known, yet if you have two separate
>>> functions handle the derivatives and the actual function value. For
>>> example f^2(x). You can probably derive some kind of caching scheme,
>>> but still.
>>> 
>>> Maybe I am missing something, but I spend about an hour trying to
>>> figure out how change my code to adapt to your new framework. Still
>>> haven't figured it out.
>> 
>> I can easily understand. It is really new and needs some polishing and
>> documenting. I am sorry for that.
> 
> The implementation was a huge work already. Thanks fo that.
> 
>> 
>> In your case, if you already have two different functions, you can merge
>> them to create a MultivariateDifferentiableVectorFunction and pass this
>> to the optimizer. See how
>> FunctionUtils.toMultivariateDifferentiableVectorFunctiontoMultivariateDifferentiableVectorFunction
>> does it, starting from a DifferentiableMultivariateVectorFunction.
>> 
>> See below about the deprecation of this converter method, though.
>> 
>> Note that the new API is simply another way to represent the same
>> information. The former way was limited to first derivatives and was
>> really awkward when multiple dimensions were involved (as you derive
>> with respect to several variables, a real function becomes a vector
>> function (a gradient), a vector function becomes a matrix function and
>> it becomes quickly untrackable. If you start thinking about second
>> derivative, it is worse. It was also difficult when you combine
>> functions, for example if you compute f(u), it looks like a univariate
>> function, but if you see that u = g(x, y, z), it is really a
>> multivariate function. When computing differentials, you have some
>> problems pushing all partial differentials du/dx, du/dy, du/dz to the
>> df/du function, there is a bottleneck. The only solution people had was
>> to do the composition outside by themselves. The new API avoids that, it
>> doesn care if u is a simple canonical variable or by itself a function
>> from some former computations.
>> 
>> For information, the original proposal from July about the
>> differentiation framework is here:
>> <http://mail-archives.apache.org/mod_mbox/commons-dev/201207.mbox/%3C500C2BBF.2020806%40spaceroots.org%3E>.
>> 
>>> 
>>> On Nov 30, 2012, at 11:11 AM, Gilles Sadowski
>>> <gil...@harfang.homelinux.org> wrote:
>>> 
>>>> Hello.
>>>> 
>>>> Context:
>>>> 1. A user application computes the Jacobian of a
>>>> multivariate vector function (the output of a simulation) using
>>>> finite differences.
>>>> 2. The covariance matrix is obtained from
>>>> "AbstractLeastSquaresOptimizer". In the new API, the Jacobian is
>>>> supposed to be "automatically" computed from the
>>>> "MultivariateDifferentiableVectorFunction" objective function.
>>>> 3. The converter from "toMultivariateDifferentiableVectorFunction"
>>>> to "MultivariateDifferentiableVectorFunction" (in "FunctionUtils")
>>>> is deprecated.
>> 
>> It was both introduced in 3.1 and deprecated at the same time. It's not
>> the converter per se which is considered wrong, it is the
>> DifferentiableMultivariateVectorFunction interface which is deprecated,
>> hence the converter using this interface is deprecated as a consequence.
> 
> I know.
> 
> However, you suggested above that we look at the converter code in order to
> figure out how to switch to the new API. Of course, that's what I've done
> all day, which made me wonder: If users need to do the same over and over
> again, why not keep the code in CM indeed?
> 
>> 
>>>> 4. A "FiniteDifferencesDifferentiator" operator currently exists
>>>> but only for univariate functions.
>>>> Unles I'm mistaken, a direct extension to multiple variables won't
>>>> do:
>>>> * because the implementation uses the symmetric formula, but in
>>>> some cases (bounded parameter range), it will fail, and
>>>> * because of the floating point representation of real values, the 
>>>> delta for sampling the function should depend on the magnitude of 
>>>> of the parameter value around which the sampling is done whereas
>>>> the "stepSize" is constant in the implementation.
>>>> 
>>>> Questions:
>> 
>>>> 1. Shouldn't we keep the converters so that users can keep their
>>>> "home-made" (first-order) derivative computations? [Converters
>>>> exist for gradient of "DifferentiableMultivariateFunction" and
>>>> Jacobian of "DifferentiableMultivariateVectorFunction".]
>> 
>> We could decide to not deprecate the older interface and hence not
>> deprecate these converters. We could also set up a converter that would
>> not use a DifferentiableMultivariateVectorFunction but could use instead
>> two parameters: one MultivariateVectorFunction and one
>> MultivariateMatrixFunction.
>> 
>> I'm not sure about this. I would really like to get rid of the former
>> interface which is confusing as we introduced the new one.
> 
> IIUC, the "problem" or "barrier" is (as I hinted at in my previous post)
> that the "DerivativeStructure" does not bring anything when we just fill it
> with precomputed values. (is that correct?).
> It just makes the interface very unfamiliar (using a complex structure)
> where "double[]" and "double[][]" were fairly straightforward to represent
> "multivariate" values and Jacobian, respectively. [Maybe this point can be
> alleviated by an extensive example in the user guide.]
> 
>> 
>>>> 2. Is it worth creating the multivariate equivalent of the
>>>> univariate "FiniteDifferencesDifferentiator", assuming that higher
>>>> orders will rarely be used because of
>>>> * the loss of accuracy (as stated in the doc), and/or
>>>> * the sometimes prohibitively expensive number of evaluations of
>>>> the objective function? [1]
>> 
>> Yes, I think it is worth doing, using separate step sizes for each
>> components (as they may have very different behaviour and units).
> 
> Indeed (see below).
> 
>> 
>>>> 3. As current CM optimization algorithms need only the gradient or 
>>>> Jacobian, would it be sufficient to only provide a limited
>>>> (two-points first-order) finite differences operator (with the
>>>> possiblity to choose either "symmetric", "forward" or "backward"
>>>> formula for each parameter)?
>> 
>> I'm not sure. As I explained above, the framework was not done with only
>> the optimizers in mind (they are an important use of differentials, but
>> are not the only one). This framework is also used on its own as a
>> [math] feature, directly from user code.
>> 
>> The underlying computation for derivatives using finite differences can
>> be easily improved to use non-symmetric points, even for high order
>> derivatives. It is sufficient to compute the abscissae correctly by
>> changing the single line :
>> 
>>  final double xi = t0 + stepSize * (i - 0.5 * (nbPoints - 1));
> 
> What about:
>  double xi = t0 * (1 + stepSize * (i - 0.5 * (nbPoints - 1)))
> ?
> [In order to automatically adapt to the accuracy loss due to floating
> point. I.e. the "stepSize" would be relative.]
> 
>> 
>> Once the xi are computed, the remaining formulas are valid and they do
>> not suppose the xi are around the point (they may even not be regularly
>> spaced). The only constraint is that all xi are different from each other.
> 
> Personally, I'd be pleased to have more documentation in the code (around
> the part that mentions "differences diagonal arrays", "interpolation" and
> "monomial") or just a pointer to where that computation is explained.
> 
>> 
>> So it would be possible to first add a enumerate : Formula with values
>> CENTERED, BACKWARD, FORWARD as a constructor parameter to the existing
>> univariate FiniteDifferencesDifferentiator and then to add another
>> differentirator for multivariate functions, using an array of stepsized
>> and an array of Formula enumerates so all components are computed
>> according to their constraints. I think we could use the same number of
>> points for all components, but it would also be possible to have a
>> different number of points if needed.
>> 
>> Would you like me to do that?
> 
> I indeed tried to create the "multivariate" version, before being stuck in
> even more basic things.
> E.g. in trying to understand "DerivativeStructure", I went to the unit tests
> and found this ("testToMultivariateDifferentiableFunction() in
> "FunctionUtilsTest):
> -----
>    MultivariateDifferentiableFunction converted
>        = FunctionUtils.toMultivariateDifferentiableFunction(hypot);
>    for (double x = 0.1; x < 0.5; x += 0.01) {
>       for (double y = 0.1; y < 0.5; y += 0.01) {
>           DerivativeStructure[] t = new DerivativeStructure[] {
>                new DerivativeStructure(3, 1, x, 1.0, 2.0, 3.0 ), // <--
>                new DerivativeStructure(3, 1, y, 4.0, 5.0, 6.0 )  // <--
>           };
>           DerivativeStructure h = DerivativeStructure.hypot(t[0], t[1]);
>           Assert.assertEquals(h.getValue(),
>                               converted.value(t).getValue(), 1.0e-10);
>           Assert.assertEquals(h.getPartialDerivative(1, 0, 0),
>                               converted.value(t).getPartialDerivative(1, 0, 
> 0),
>                               1.0e-10);
>           Assert.assertEquals(h.getPartialDerivative(0, 1, 0),
>                               converted.value(t).getPartialDerivative(0, 1, 
> 0),
>                               1.0e-10);
>           Assert.assertEquals(h.getPartialDerivative(0, 0, 1),
>                               converted.value(t).getPartialDerivative(0, 0, 
> 1),
>                               1.0e-10);
>        }
>    }
> -----
> I don't understand the lines which I marked with an arrow.
> Why 3 variables, why that constructor (cf. remark above)? In fact, why is it
> not:
> -----
>     new DerivativeStructure(2, 1, 0, x),
>     new DerivativeStructure(2, 1, 1, y)
> -----
> ? [And consequent changes in the calls to "getPartialDerivative(int...)".]
> 
> I didn't figure out why the test pass, or what should be result of the
> computation: "h" does not contain the first partial derivatives of "hypot"
> at the given (x, y) ...
> 
> 
> Thanks for bits of enlightenment,
> Gilles
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to