Re: [math] Inconsistencies (bugs) in PascalDistribution?

Mikkel Meyer Andersen Mon, 28 Nov 2011 14:09:30 -0800

2011/11/28 Phil Steitz <phil.ste...@gmail.com>:
> On 11/28/11 12:18 AM, Sébastien Brisard wrote:
>> Hello,
>> while working on MATH-711, I think I stumbled upon some
>> inconsistencies in the implementation of the Pascal distribution. In
>> fact, these might well be a bug, but since I'm a bit rusty on
>> probabilities, I would be grateful for some feedback.
>> Here is the thing. The header of the Javadoc states that the random
>> variable (say X) being represented by this class is the number of
>> *failures*. First of all, I'm not questioning this convention, but I
>> think the current Javadoc is slightly confusing, because it refers to
>> the Wikipedia website, where the opposite convention is adopted (X is
>> the number of *successes*). This would not matter too much, but the
>> Javadoc would require some alterations -- I think.
>> More disturbing: the names of the parameters of the distribution are
>> not consistent with the wikipedia page being referred to:
>>   - p is the probability of success in both cases: OK,
>>   - r is the number of failures in Wikipedia, but the number of
>> successes in CM... I would suggest that we at least rename this
>> parameter s or something. Except if it is generally accepted by the
>> "Pascal community" that the first parameter should always be called r,
>> no matter the convention. In which case I would recommend that the
>> references to Wikipedia be removed.
>>
>> Finally, the bug. According to my hand-calcs, I think that with the
>> notations of CM, the mean of X is given by
>> mean(X) = r * (1 - p) / p,
>> while the currently implemented formula is
>> r * p / (1 - p),
>> which is actually the formula corresponding to the Wikipedia convention.
>>
>> Here is a very naive piece of code supporting my calcs
>> {code}
>> public class PascalDistributionDemo {
>>     public static void main(String[] args) {
>>         final int r = 10;
>>         final double p = 0.2;
>>         final int numTerms = 1000;
>>         final PascalDistribution distribution = new PascalDistribution(r, p);
>>         double mean = 0.;
>>         for (int k = numTerms - 1; k >= 0; k--) {
>>             mean += k * distribution.probability(k);
>>         }
>>         // The following prints 40.00000000000012
>>         System.out.println("Estimate of the mean = " + mean);
>>         // The following prints 2.5
>>         System.out.println("CM implementation = " +
>>                            distribution.getNumericalMean());
>>         // The following prints 2.5
>>         System.out.println("r * p / (1 - p) = " + (r * p / (1 - p)));
>>         // The following prints 40.0
>>         System.out.println("r * (1 - p) / p = " + (r * (1 - p) / p));
>>     }
>> }
>> {code}
>>
>> Again, I'm not an expert in this area, so I might be very, very wrong,
>> and I apologize beforehand if that's the case. If I'm right, though, I
>> think that
>> 1. the Javadoc needs to be seriously rewritten, and the convention
>> adopted *must* be stated very clearly,
>> 2. the implementation needs thorough verification.
>
> I looked a little more carefully at this.  I think the mean formula
> is incorrect, so this is a bug.  Regarding the convention used, it
> is clearly stated in the class javadoc, but you are right it should
> be made more clear exactly what is being computed.  I am -0 for
> changing the implementation, because that would force current users
> to switch success/failure.  It would be best, I think to add full
> formulas to the javadoc.  I don't think it is necessary to use the
> same letters as the references do.
>
> Phil
>>
>> I'm happy to do that and open a ticket on this issue once you give me the 
>> go...
>>
>> Best,
>> Sébastien
>>
>>
Thanks for discovering this! I did the implementation, and apparently
assumed notation as the referred source. Sorry for this.


We should also check the variance - this might be suffering from wrong
notation as well.

Cheers, Mikkel.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [math] Inconsistencies (bugs) in PascalDistribution?

Reply via email to