[jira] [Commented] (STATISTICS-37) Better inverse Erfc function for the Normal distribution

Alex Herbert (Jira) Fri, 22 Oct 2021 06:45:08 -0700


    [ 
https://issues.apache.org/jira/browse/STATISTICS-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432981#comment-17432981
 ]


Alex Herbert commented on STATISTICS-37:
----------------------------------------

This has been fixed by NUMBERS-171.

The tests have been updated to use higher tolerances. The normal distribution 
tolerance was raised to approximately 3.5e-15. This is now one of the the most 
accurate distributions compared to reference data.

Note that the NormalDistribution computed the value to pass to the error 
function as:
{noformat}
xx = (x - mu) / (sd * sqrt(2))
{noformat}
This differs from the other implementations which appear to compute (verified 
by reverse engineering their results):
{noformat}
xx = (x - mu) / sd / sqrt(2)
{noformat}
The present implementation has only 1 divide. The factor sqrt(2 * sd * sd) can 
be computed in the constructor. To maximise precision I have implemented this 
using extended precision floating point operations. The method is exact when 
tested on 500 random values in [1, 2). For reference the following errors are 
observed on the same data for standard precision computations:
||Metho||Max Error||RMS error||
|sd * sqrt(2)| 1.0598|0.4781|
|Sqrt(2 * sd * sd)|0.7780| 0.2144|

These have been measured using BigDecimal from an extended precision result to 
25 digits.

The sd * sqrt(2) method was the original method. It has the largest errors. 
Replacing this with a method that computes the exact factor eliminates a source 
of error. This is warranted as the new error function in Numbers has max 2 ULP 
error.

I updated the TruncatedNormalDistribution to exploit the high precision 
computation for a range between x0 and x1. This is based on the ErfDifference 
class in numbers. It is more accurate than the previous method using cdf(x1) - 
cdf(x0). This distribution now computes very accurate probabilities.

The distribution has a numerically unstable computation for the mean and 
variance. When using the updated logic for the range this resulted in a 
variance of zero for one test case due to cancellation. I have left the 
computation of moments alone. These moments are not used in the rest of the 
distribution. The inverse CDF uses the inverse error function not a default 
search bounded by a range defined by the mean and variance. A ticket should be 
raised to track improvements to the moments computation.

The levy distribution tolerance was raised from 1e-9 to 1e-14. Some test data 
required regeneration.


> Better inverse Erfc function for the Normal distribution
> --------------------------------------------------------
>
>                 Key: STATISTICS-37
>                 URL: https://issues.apache.org/jira/browse/STATISTICS-37
>             Project: Apache Commons Statistics
>          Issue Type: Improvement
>          Components: distribution
>    Affects Versions: 1.0
>            Reporter: Alex Herbert
>            Priority: Major
>
> The Normal distribution uses InverseErfc for the inverse CDF. This is a 
> wrapper for InverseErf and as such it is limited in precision as p -> 0 since 
> the method computes using (2p - 1). When p is < 1e-16 then the inverse CDF 
> returns -infinity. The smallest z is around -8.2
> as computed by matlab:
> norminv(1e-16)
> -8.222082216130437
> scipy, R, matlab, octave, Mathematica all have methods to invert small p. 
> Here is output from matlab for norminv(5e-324): -38.472346342768788. It is 
> accurate down to the minimum value for a 64-bit float.
> There are two free libraries that seem to be used for this: Boost and Cephes 
> (function ntdri). The Boost licence is compatible with Apache and there are 
> some Boost derived works already in Commons Numbers. Incorporating the Boost 
> Erf functions into numbers would be useful.
> The inability to invert the Erfc for all p affects the Normal, Truncated 
> normal and Levy distributions.
> Boost error function inverses:
> [https://www.boost.org/doc/libs/1_77_0/libs/math/doc/html/math_toolkit/sf_erf/error_inv.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (STATISTICS-37) Better inverse Erfc function for the Normal distribution

Reply via email to