2011/11/28 Phil Steitz <phil.ste...@gmail.com>: > On 11/28/11 12:18 AM, Sébastien Brisard wrote: >> Hello, >> while working on MATH-711, I think I stumbled upon some >> inconsistencies in the implementation of the Pascal distribution. In >> fact, these might well be a bug, but since I'm a bit rusty on >> probabilities, I would be grateful for some feedback. >> Here is the thing. The header of the Javadoc states that the random >> variable (say X) being represented by this class is the number of >> *failures*. First of all, I'm not questioning this convention, but I >> think the current Javadoc is slightly confusing, because it refers to >> the Wikipedia website, where the opposite convention is adopted (X is >> the number of *successes*). This would not matter too much, but the >> Javadoc would require some alterations -- I think. >> More disturbing: the names of the parameters of the distribution are >> not consistent with the wikipedia page being referred to: >> - p is the probability of success in both cases: OK, >> - r is the number of failures in Wikipedia, but the number of >> successes in CM... I would suggest that we at least rename this >> parameter s or something. Except if it is generally accepted by the >> "Pascal community" that the first parameter should always be called r, >> no matter the convention. In which case I would recommend that the >> references to Wikipedia be removed. >> >> Finally, the bug. According to my hand-calcs, I think that with the >> notations of CM, the mean of X is given by >> mean(X) = r * (1 - p) / p, >> while the currently implemented formula is >> r * p / (1 - p), >> which is actually the formula corresponding to the Wikipedia convention. >> >> Here is a very naive piece of code supporting my calcs >> {code} >> public class PascalDistributionDemo { >> public static void main(String[] args) { >> final int r = 10; >> final double p = 0.2; >> final int numTerms = 1000; >> final PascalDistribution distribution = new PascalDistribution(r, p); >> double mean = 0.; >> for (int k = numTerms - 1; k >= 0; k--) { >> mean += k * distribution.probability(k); >> } >> // The following prints 40.00000000000012 >> System.out.println("Estimate of the mean = " + mean); >> // The following prints 2.5 >> System.out.println("CM implementation = " + >> distribution.getNumericalMean()); >> // The following prints 2.5 >> System.out.println("r * p / (1 - p) = " + (r * p / (1 - p))); >> // The following prints 40.0 >> System.out.println("r * (1 - p) / p = " + (r * (1 - p) / p)); >> } >> } >> {code} >> >> Again, I'm not an expert in this area, so I might be very, very wrong, >> and I apologize beforehand if that's the case. If I'm right, though, I >> think that >> 1. the Javadoc needs to be seriously rewritten, and the convention >> adopted *must* be stated very clearly, >> 2. the implementation needs thorough verification. > > I looked a little more carefully at this. I think the mean formula > is incorrect, so this is a bug. Regarding the convention used, it > is clearly stated in the class javadoc, but you are right it should > be made more clear exactly what is being computed. I am -0 for > changing the implementation, because that would force current users > to switch success/failure. It would be best, I think to add full > formulas to the javadoc. I don't think it is necessary to use the > same letters as the references do. > > Phil >> >> I'm happy to do that and open a ticket on this issue once you give me the >> go... >> >> Best, >> Sébastien >> >> Thanks for discovering this! I did the implementation, and apparently assumed notation as the referred source. Sorry for this.
We should also check the variance - this might be suffering from wrong notation as well. Cheers, Mikkel. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org