[ 
https://issues.apache.org/jira/browse/STATISTICS-35?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Herbert resolved STATISTICS-35.
------------------------------------
    Fix Version/s: 1.0
       Resolution: Fixed

Switch sampling to a Gaussian approximation for large mean. The threshold is 
0.5 * Integer.MAX_VALUE.

Verified that the Gaussian approximation using the 0.5 shift on the mean will 
pass a significance test on the quartiles for sampling when the mean is as low 
as 10. Without the 0.5 shift the null hypothesis is rejected for small means.

Shifting the Gaussian mean by +0.5 effectively performs rounding on the samples 
generated from the Gaussian sampler to the nearest integer when using a cast to 
an int:
{code:java}
double y = /* generate Gaussian sample */
int x = (int)(y + 0.5)
{code}
Added in commit:

e48e870efe55c06e91b455953e47ffd498429fd3

> PoissonDistribution cannot create a sampler with a mean above 2^30
> ------------------------------------------------------------------
>
>                 Key: STATISTICS-35
>                 URL: https://issues.apache.org/jira/browse/STATISTICS-35
>             Project: Apache Commons Statistics
>          Issue Type: Bug
>          Components: distribution
>    Affects Versions: 1.0
>            Reporter: Alex Herbert
>            Priority: Minor
>             Fix For: 1.0
>
>         Attachments: normal_approx_delta.jpg
>
>
> The PoissonDistribution can be parameterised with any mean and yet the 
> PoissonSampler in RNG excludes a mean above 2^30 to avoid truncation. This 
> change was made to avoid distribution truncation and because the algorithm 
> uses (int) floor(mean) (see [RNG-52]).
> Commons RNG now has a LongSampler interface and it may be possible to create 
> a PoissonSampler that outputs a long with the same algorithm and avoid 
> truncation to int values. The final PoissonDistribution is still bounded by 
> int values but the sampler can be less restrictive if it uses long. At large 
> means the Poisson sampler mainly uses a Gaussian
> sampler and runtime performance should not be impacted.
> Investigate if the PoissonSampler algorithm can create samples from the 
> PoissonDistribution that passes a chi-square test when the mean is large. If 
> this is possible then implement this in RNG. Otherwise inverse transform 
> sampling will have to be used for large mean (with an anticipated large 
> performance impact).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to