Re: [Statistics] Convention when outside support?

Gilles Sadowski Fri, 29 Nov 2019 10:25:50 -0800

Hi.

Le ven. 29 nov. 2019 à 18:41, Alex Herbert <alex.d.herb...@gmail.com> a écrit :
>
> On 29/11/2019 16:48, Gilles Sadowski wrote:
> > Hello.
> >
> > For all implemented distributions, what convention should be adopted
> > when methods
> >   * density(x)
> >   * logDensity(x)
> >   * cumulativeProbability(x)
> > are called with "x" out of the "support" bounds?
> >
> > Currently some (but not all[1]) are documented to return "NaN".
> > An alternative could be to throw an exception.
>
> The convention in the java.lang.Math class is to return NaN for things
> that do not make sense, e.g.
>
> Math.log(-1)
> Math.asin(4)


But are we in the same kind of (wrong) usage when considering
the argument to the above methods?
I mean: If we ask the question of "What is the density at x?", is
it really an error to reply "0" when outside the domain?

> This leaves it as the responsibility of the caller to know when it may
> be possible to pass in a bad value and so check the results.
>
> It unfortunately leaves open the issue that not everyone will do that
> and so their program can be brought to a stop by presence of NaN values
> that may have appeared some way further back in the computation.
>
> Throwing an exception seems to be the only way to preserve the stack
> trace of where the computation went wrong.
>
> So either case has merit.
>
> What do other languages do? A few seem to return 0 for out of support.
>
> I had a look at Python. Here there is not much consistency using scipy:
>
>  >>> import math
>  >>> from scipy.stats import gamma
>  >>> gamma.pdf(0.5, 1.99)
> 0.3066586069413397
>  >>> gamma.pdf(-0.5, 1.99)
> 0.0
>  >>> gamma.logpdf(-0.5, 1.99)
> -inf
>  >>> math.log(0)
> Traceback (most recent call last):
>    File "<stdin>", line 1, in <module>
> ValueError: math domain error
>
> So scipy returns 0 for the density function when outside support. It
> returns -inf for the log of zero but python's math function returns an
> exception for the log of zero.
>
> In R the behaviour is the same as python with the exception that the log
> of zero is -Inf.
>
>  > dgamma(0, 2)
> [1] 0
>  > dgamma(-1, 2)
> [1] 0
>  > dgamma(-1, 2, log=TRUE)
> [1] -Inf
>  > log(0)
> [1] -Inf
>
> So returning 0 is another option. However this cannot distinguish a
> valid return of 0 from an error.
>
> Note that if we did not have double as a return value then throwing an
> exception would be the primary choice for signalling error as there is
> no NaN for other numbers. However there are documented cases for
> computations in the JDK which do not make sense that avoid throwing
> exceptions as in Math.abs(int) for Integer.MIN_VALUE which still returns
> a negative.
>
> I'm not a fan of static properties to configure the behaviour either
> way. I don't think using zero is a good idea as it cannot signal
> something is wrong.
>
> I would favour one of the following:
>
> - Provide alternative methods to return NaN or throw
> - Always return NaN (which seems more Java conventional) and provide a
> wrapper distribution that can wrap calls to density, logDensity and
> cumulativeProbability and throw an exception if the underlying
> distribution returns NaN.
> - Always throw (which forces users to safe usage) and provide a wrapper
> distribution that can wrap calls to density, logDensity and
> cumulativeProbability and return NaN or zero if the underlying
> distribution throws.
>
> When considering the situation where you can create a distribution with
> a bad value and you get an exception, but you can use a distribution
> with a bad value and you get NaN it seems to me that throwing an
> exception may be the more sensible approach. A wrapper to guard
> exceptions can be user configurable to return NaN or zero.

Instantiating and raising an exception is (relatively) costly.
So if the "return NaN" feature is used in a use-case where performance
matters, the wrapper would spoil the intended purpose.

Gilles

>
> Alex
> > Regards,
> > Gilles
> >
> > [1] https://issues.apache.org/jira/projects/MATH/issues/MATH-1503
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [Statistics] Convention when outside support?

Reply via email to