Re: [MATH] What is the derivative of 0^x

Ajo Fod Wed, 28 Aug 2013 07:58:03 -0700

To define things precisely:
y = f(a,x) = |a|^x

Can we agree that:
df(a,x)/dx -> 0 when a->0 and x > 0 :[ NOTE: x > 0]


If this is acceptable, we get this very useful property that df (a,x)/dx is
defined and continuous for all a provided x>0 because we use the modulus of
a in the function definition. In optimization, with this patch at |a|=0, I
can set an optimizer to search the whole real line without worrying about
a=0 otherwise I've to look out for a=0 explicitly. It seems unnecessary to
add a constraint to make |a|>0. I already have a constraint for x >0.

Cheers,
Ajo.



On Tue, Aug 27, 2013 at 1:49 PM, Luc Maisonobe <[email protected]>wrote:

> Hi Ajo,
>
> Le 27/08/2013 16:44, Ajo Fod a écrit :
> > Thanks for the constant structure.
> >
> > No. The limit value when x->0+ is 1, not O.
> >
> > I agree with this. I was just going for the derivatives = 0.
> >
> >
> >> The nth derivative of a^x can be computed analytically as ln(a)^n a^x,
> >> so the initial slope at x=0 is simply ln(a), positive for a > 1, zero
> >> for a = 1, negative for 0 < a < 1 with a limit at -inifnity when a ->
> 0+.
> >>
> >
> > Lets think about this for a sec:
> > Derivative of |a|^x wrt x at x=2.0 for various values of a
> > [email protected]=-0.003384
> > [email protected]=-0.001015
> > [email protected]=-0.000296
> > [email protected]=-0.000085
> > [email protected]=-0.000024
> > ... tends to 0
>
> yes, because 2.0 > 0.
>
> >
> > Derivative of |a|^x wrt x at x=0.5 for various values of a
> > [email protected]=-0.612555
> > [email protected]=-0.428759
> > [email protected]=-0.275612
> > [email protected]=-0.168418
> > [email protected]=-0.099513
> > [email protected]=-0.057407
> > [email protected]=-0.032528
> > [email protected]=-0.018176
> > ... tends to 0 when a->0
>
> yes because 0.5 > 0.
>
> >
> > The code I used for the print outs is:
> >     static final double EPS = 0.0001d;
> >
> >     public static void main(final String[] args) {
> >         final double x = 0.5d;
> >         int from = 5;
> >         int to = 20;
> >         System.out.println("Derivative of |a|^x wrt x at x=" + x);
> >         for (int p = from; p < to; p+=2) {
> >             double a = Math.pow(2d, -p);
> >             final double calc = (Math.pow(a, x + EPS) - Math.pow(a, x)) /
> > EPS;
> >             System.out.format("Derivative@%f=%f \n", a, calc);
> >         }
> >     }
> >
> > As for the x=0 case:
> > 1^0 = 1
> > 0.5^0 = 1
> > 0.0001^0 = 1
> > 0^0 is technically undefined, but  1 is a good definition:
> > http://www.math.hmc.edu/funfacts/ffiles/10005.3-5.shtml
>
> Yes.
>
> > ... so, a good value for the differential of da^x/dx  limit x->0 and
> a->0 =
> > 0
>
> I don't agree. What you wrote in the lines above is another way to say
> what I wrote in my previous message: the value at x=0 is always y=1, and
> the value for x > 0 tends to 0 as a->0+.
>
> So the function always starts at 1 and dives more and more steeply as a
> becomes smaller, and the derivative at 0 becomes more and more negative,
> up to -infinity, *not* 0.
>
> The function is ill-behaved and the fact the derivative is infinite is
> consistent with this ill-behaviour.
>
> The definition of the derivative is :
>
>  f'(x) = lim (f(x+h) - f(x))/h when h -> 0+
>
> when f(x) = 0^x and assuming 0^0 = 1 as you have agreed above, this gives:
>
>  f'(0) = lim (0^(0+h) - 0^0)/h = lim (0 - 1)/h = -infinity
>
> which is exactly the same result as computing for a non-null a and then
> reducing it: d(a^x)/dx = ln(a) a^x = ln(a) when x=0, diverges to
> -infinity when a converges to 0.
>
> >
> >
> > As mentioned earlier, I think the cause for this is that log|a| ->
> infinity
> > slower than |a|^x -> 0 as |a|->0 .
>
> But a^x does *not* converge to 0 for x = 0! a^0 is always 1 (rigorously)
> regardless of the value of a as long as it is not 0, and then when we
> change a we can also consider the limit is 1 when a-> 0. This convention
> is well accepted. This convention is implemented in the Java standard
> Math.pow function, and we followed this trend. This is the reason why
> the functions becomes more and more steep as a becomes smaller. At the
> end, it is a discontinuous function (and hence should not be
> differentiable, or it is differentiable only if we use extended real
> numbers with infinity added).
>
> This is the heart of the ill-behaviour of 0^0. We want to compute it as
> a limit value for a^b when both parameters converge to 0, but we get a
> different result if we first set a fixed and converge b to 0, and later
> reduce a down to zero (your approach), and when we do the opposite. In
> one case we get 0, in the other case we get 1.
>
> Lets put it another way:
> If we consider the derivative f'(0) should be 0, then the value f(0)
> should also be considered equal to zero. This would mean as soon as we
> get a tiny non-zero a (say the smallest number that can be represented
> as a double), then f(0) would jump from 0 to 1 instantly, and f'(0)
> would jump from 0 to -infinity instantly. So we would have at a = 0 an
> initial null derivative, then a jump to a very negative derivative as a
> leaves 0, then the derivative would become less and less negative as a
> increase up to 1, at a=1 the derivative would again be 0, then the
> derivative would continue to increase and becode positive as a grows
> larger than 1 (all these derivatives are computed at x=0, and as written
> previously, they are simply equal to log(a)).
>
> To summarize, the two choices are:
>  1) - first considering a fixed a, strictly positive,
>     - then looking globally at the function a^x for all values x>=0,
>     - then reducing a, noting that all functions start at the same
>       point x=0, y=1 and the derivatives become more and more negative
>       as the function becomes more and more ill-behaved
>  2) - first considering a fixed x, strictly positive,
>     - then reducing a and identifying the limit values is 0 for all a,
>     - then building a function by packing all the x>0, which is very
>       smooth as it is identically 0 for all x>0
>     - finally adding the limit value at x=0, which in this case would
>       be 0 (and the derivative would also be 0).
>
> it seems well accepted to consider the value of 0^0 should be set to 1,
> and as a consequence the corresponding derivative with respect to x
> should be set to -infinity.
>
> I fully agree it is not a perfect solution, it is an arbitrary choice.
> However, this choice is consistent with what all implementations of the
> pow function I have seen (i.e. 0^0 set to 1 instead of 0).
>
> Your approach is not wrong, it is as valid as the other one. It is
> simply not the common choice.
>
> I would say an even better choice would have been to say 0^0 *is not*
> defined and even the value should be set to NaN (not even speaking of
> the derivative).
>
> Does this seem acceptable to you?
>
> best regards,
> Luc
>
> >
> > Cheers,
> > Ajo.
> >
> >
> >> The limit curve corresponding to a = 0 is therefore a singular function
> >> with f(0) = 1 and f(x) = 0 for all x > 0. The fact f(0) = 1 and not 0 is
> >> consistent with the derivative being negative infinity, as by definition
> >> the derivative is the limit of [f(0+h) - f(0)] / h when h->0+, as the
> >> finite difference is -1/h.
> >>
> >>>                 }
> >>>             }else{
> >>>                 for (int i = 0; i < function.length; ++i) {
> >>>                     function[i] = Double.NaN;
> >>>                 }
> >>
> >> This alternative case is a good improvement, thanks for it. I forgot to
> >> handle negative cases properly. I have therefore changed the code
> >> (committed as r1517788) with this improvement, together with several
> >> test cases.
> >>
> >>>             }
> >>>         } else {
> >>>
> >>>
> >>> in place of :
> >>>
> >>>         if (a == 0) {
> >>>             if (operand[operandOffset] == 0) {
> >>>                 function[0] = 1;
> >>>                 double infinity = Double.POSITIVE_INFINITY;
> >>>                 for (int i = 1; i < function.length; ++i) {
> >>>                     infinity = -infinity;
> >>>                     function[i] = infinity;
> >>>                 }
> >>>             }
> >>>         } else {
> >>>
> >>>
> >>> PS: I think you made a change to DSCompiler.pow too. If so, what
> happens
> >>> when a=0 & x!=0  in that function?
> >>
> >> No, I didn't change the other signatures of the pow function. So the
> >> value should be OK (i.e. 1) but all derivatives, including the first
> >> one, should be NaN. What the new function brings is a correct negetive
> >> infinity first derivative at singularity point, better accuracy for
> >> non-singular points, and possibly faster computation.
> >>
> >> best regards,
> >> Luc
> >>
> >>>
> >>>
> >>> On Mon, Aug 26, 2013 at 12:38 AM, Luc Maisonobe <[email protected]>
> >> wrote:
> >>>
> >>>>
> >>>>
> >>>>
> >>>> Ajo Fod <[email protected]> a écrit :
> >>>>> Are you saying patched the code? Can you provide the link?
> >>>>
> >>>> I committed it in the development version. You just have to update
> your
> >>>> checked out copy from either the official
> >>>>  Apache subversion repository or the git mirror we talked about in a
> >>>> previous thread.
> >>>>
> >>>> The new method is a static one called pow and taking a and x as
> >> arguments
> >>>> and returning a^x. Not to
> >>>> Be confused with the non-static methods that take only the power as
> >>>> argument (either int, double or
> >>>> DerivativeStructure) and use the instance as the base to apply power
> on.
> >>>>
> >>>> Best regards,
> >>>> Luc
> >>>>
> >>>>>
> >>>>> -Ajo
> >>>>>
> >>>>>
> >>>>> On Sun, Aug 25, 2013 at 1:20 PM, Luc Maisonobe <[email protected]>
> >>>>> wrote:
> >>>>>
> >>>>>> Le 24/08/2013 11:24, Luc Maisonobe a écrit :
> >>>>>>> Le 23/08/2013 19:20, Ajo Fod a écrit :
> >>>>>>>> Hello,
> >>>>>>>
> >>>>>>> Hi Ajo,
> >>>>>>>
> >>>>>>>>
> >>>>>>>> This shows one way of interpreting the derivative for strictly +ve
> >>>>>> numbers.
> >>>>>>>>
> >>>>>>>>     public static void main(final String[] args) {
> >>>>>>>>         final double x = 1d;
> >>>>>>>>         DerivativeStructure dsA = new DerivativeStructure(1, 1, 0,
> >>>>> x);
> >>>>>>>>         System.out.println("Derivative of |a|^x wrt x");
> >>>>>>>>         for (int p = 10; p < 21; p++) {
> >>>>>>>>             double a;
> >>>>>>>>             if (p < 20) {
> >>>>>>>>                 a = 1d / Math.pow(2d, p);
> >>>>>>>>             } else {
> >>>>>>>>                 a = 0d;
> >>>>>>>>             }
> >>>>>>>>             final DerivativeStructure a_ds = new
> >>>>> DerivativeStructure(1,
> >>>>>> 1,
> >>>>>>>> a);
> >>>>>>>>             final DerivativeStructure out = a_ds.pow(dsA);
> >>>>>>>>             final double calc = (Math.pow(a, x + EPS) -
> >>>>> Math.pow(a, x))
> >>>>>> /
> >>>>>>>> EPS;
> >>>>>>>>             System.out.format("Derivative@%f=%f  %f\n", a, calc,
> >>>>>>>> out.getPartialDerivative(new int[]{1}));
> >>>>>>>>         }
> >>>>>>>>     }
> >>>>>>>>
> >>>>>>>> At this point I"m explicitly substituting the rule that
> >>>>>> derivative(|a|^x) =
> >>>>>>>> 0 for |a|=0.
> >>>>>>>
> >>>>>>> Yes, but this fails for x = 0, as the limit of the finite
> >>>>> difference is
> >>>>>>> -infinity and not 0.
> >>>>>>>
> >>>>>>> You can build your own function which explicitly assumes a is
> >>>>> constant
> >>>>>>> and takes care of special values as follows:
> >>>>>>>
> >>>>>>>  public static DerivativeStructure aToX(final double a,
> >>>>>>>                                         final DerivativeStructure
> >>>>> x) {
> >>>>>>>      final double lnA = (a == 0 && x.getValue() == 0) ?
> >>>>>>>                   Double.NEGATIVE_INFINITY :
> >>>>>>>                   FastMath.log(a);
> >>>>>>>      final double[] function = new double[1 + x.getOrder()];
> >>>>>>>      function[0] = FastMath.pow(a, x.getValue());
> >>>>>>>      for (int i = 1; i < function.length; ++i) {
> >>>>>>>          function[i] = lnA * function[i - 1];
> >>>>>>>      }
> >>>>>>>      return x.compose(function);
> >>>>>>>  }
> >>>>>>>
> >>>>>>> This will work and provides derivatives to any order for almost any
> >>>>>>> values of a and x, including a=0, x=1 as in your exemple, but also
> >>>>>>> slightly better for a=0, x=0. However, it still has an important
> >>>>>>> drawback: it won't compute the n-th order derivative correctly for
> >>>>> a=0,
> >>>>>>> x=0 and n > 1. It will provide NaN for these higher order
> >>>>> derivatives
> >>>>>>> instead of +/-infinity according to parity of n.
> >>>>>>
> >>>>>> I have added a similar function to the DerivativeStructure class
> >>>>> (with
> >>>>>> some errors above corrected). The main interesting property of this
> >>>>>> function is that it is more accurate that converting a to a
> >>>>>> DerivativeStructure and using the general x^y function. It does its
> >>>>> best
> >>>>>> to handle the special case, but as written above, this does NOT work
> >>>>> for
> >>>>>> general combination (i.e. more than one variable or more than one
> >>>>>> order). As soon as there is a combination, the derivative will
> >>>>> involve
> >>>>>> something like df/dx * dg/dy and as infinities and zeros are
> >>>>> everywheren
> >>>>>> NaN appears immediately for these partial derivatives. This cannot
> be
> >>>>>> avoided.
> >>>>>>
> >>>>>> If you stay away from the singularity, the function behaves
> >>>>> correctly.
> >>>>>>
> >>>>>> best regards,
> >>>>>> Luc
> >>>>>>
> >>>>>>>
> >>>>>>> This is a known problem that we already encountered when dealing
> >>>>> with
> >>>>>>> rootN. Here is an extract of a comment in the test case
> >>>>>>> testRootNSingularity, where similar NaN appears instead of +/-
> >>>>> infinity.
> >>>>>>> The dsZero instance in the comment is simple the x parameter of the
> >>>>>>> function, as a derivativeStructure with value 0.0 and depending on
> >>>>>>> itself (dsZero = new DerivativeStructure(1, maxOrder, 0, 0.0)):
> >>>>>>>
> >>>>>>>
> >>>>>>> // the following checks shows a LIMITATION of the current
> >>>>> implementation
> >>>>>>> // we have no way to tell dsZero is a pure linear variable x = 0
> >>>>>>> // we only say: "dsZero is a structure with value = 0.0,
> >>>>>>> // first derivative = 1.0, second and higher derivatives = 0.0".
> >>>>>>> // Function composition rule for second derivatives is:
> >>>>>>> // d2[f(g(x))]/dx2 = f''(g(x)) * [g'(x)]^2 + f'(g(x)) * g''(x)
> >>>>>>> // when function f is the nth root and x = 0 we have:
> >>>>>>> // f(0) = 0, f'(0) = +infinity, f''(0) = -infinity (and higher
> >>>>>>> // derivatives keep switching between +infinity and -infinity)
> >>>>>>> // so given that in our case dsZero represents g, we have g(x) = 0,
> >>>>>>> // g'(x) = 1 and g''(x) = 0
> >>>>>>> // applying the composition rules gives:
> >>>>>>> // d2[f(g(x))]/dx2 = f''(g(x)) * [g'(x)]^2 + f'(g(x)) * g''(x)
> >>>>>>> //                 = -infinity * 1^2 + +infinity * 0
> >>>>>>> //                 = -infinity + NaN
> >>>>>>> //                 = NaN
> >>>>>>> // if we knew dsZero is really the x variable and not the identity
> >>>>>>> // function applied to x, we would not have computed f'(g(x)) *
> >>>>> g''(x)
> >>>>>>> // and we would have found that the result was -infinity and not
> >>>>> NaN
> >>>>>>>
> >>>>>>> Hope this helps
> >>>>>>> Luc
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Ajo.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Fri, Aug 23, 2013 at 9:39 AM, Luc Maisonobe
> >>>>> <[email protected]
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Ajo,
> >>>>>>>>>
> >>>>>>>>> Le 23/08/2013 17:48, Ajo Fod a écrit :
> >>>>>>>>>> Try this and I'm happy to explain if necessary:
> >>>>>>>>>>
> >>>>>>>>>> public class Derivative {
> >>>>>>>>>>
> >>>>>>>>>>     public static void main(final String[] args) {
> >>>>>>>>>>         DerivativeStructure dsA = new DerivativeStructure(1, 1,
> >>>>> 0,
> >>>>>> 1d);
> >>>>>>>>>>         System.out.println("Derivative of constant^x wrt x");
> >>>>>>>>>>         for (int a = -3; a < 3; a++) {
> >>>>>>>>>
> >>>>>>>>> We have chosen the classical definition which implies c^x is not
> >>>>>> defined
> >>>>>>>>> for real r and negative c.
> >>>>>>>>>
> >>>>>>>>> Our implementation is based on the decomposition c^r = exp(r *
> >>>>> ln(c)),
> >>>>>>>>> so the NaN comes from the logarithm when c <= 0.
> >>>>>>>>>
> >>>>>>>>> Noe also that as explained in the documentation here:
> >>>>>>>>> <
> >>>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
> http://commons.apache.org/proper/commons-math/userguide/analysis.html#a4.7_Differentiation
> >>>>>>>>>> ,
> >>>>>>>>> there are no concepts of "constants" and "variables" in this
> >>>>> framework,
> >>>>>>>>> so we cannot draw a line between c^r as seen as a univariate
> >>>>> function
> >>>>>> of
> >>>>>>>>> r, or as a univariate function of c, or as a bivariate function
> >>>>> of c
> >>>>>> and
> >>>>>>>>> r, or even as a pentavariate function of p1, p2, p3, p4, p5 with
> >>>>> both c
> >>>>>>>>> and r being computed elsewhere from p1...p5. So we don't make
> >>>>> special
> >>>>>>>>> cases for the case c = 0 for example.
> >>>>>>>>>
> >>>>>>>>> Does this explanation make sense to you?
> >>>>>>>>>
> >>>>>>>>> best regards,
> >>>>>>>>> Luc
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>             final DerivativeStructure a_ds = new
> >>>>>> DerivativeStructure(1,
> >>>>>>>>> 1,
> >>>>>>>>>> a);
> >>>>>>>>>>             final DerivativeStructure out = a_ds.pow(dsA);
> >>>>>>>>>>             System.out.format("Derivative@%d=%f\n", a,
> >>>>>>>>>> out.getPartialDerivative(new int[]{1}));
> >>>>>>>>>>         }
> >>>>>>>>>>     }
> >>>>>>>>>> }
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Aug 23, 2013 at 7:59 AM, Gilles
> >>>>> <[email protected]
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> On Fri, 23 Aug 2013 07:17:35 -0700, Ajo Fod wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Seems like the DerivativeCompiler returns NaN.
> >>>>>>>>>>>>
> >>>>>>>>>>>> IMHO it should return 0.
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> What should be 0?  And Why?
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> Is this worthy of an issue?
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> As is, no.
> >>>>>>>>>>>
> >>>>>>>>>>> Gilles
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>>>> -Ajo
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>
> >>>>>
> >>
> ------------------------------**------------------------------**---------
> >>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.**apache.org<
> >>>>>>>>> [email protected]>
> >>>>>>>>>>> For additional commands, e-mail: [email protected]
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>>>>>> To unsubscribe, e-mail: [email protected]
> >>>>>>>>> For additional commands, e-mail: [email protected]
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail: [email protected]
> >>>>>>> For additional commands, e-mail: [email protected]
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: [email protected]
> >>>>>> For additional commands, e-mail: [email protected]
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: [email protected]
> >>>> For additional commands, e-mail: [email protected]
> >>>>
> >>>>
> >>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> >>
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: [MATH] What is the derivative of 0^x

Reply via email to