On 8/7/12 9:36 AM, Phil Steitz wrote: > On 8/6/12 11:16 PM, Dennis Hendriks wrote: >> See below. >> >> On 08/06/2012 05:29 PM, Phil Steitz wrote: >>> >>> >>> >>> On Aug 6, 2012, at 6:06 AM, Dennis Hendriks<d.hendr...@tue.nl> >>> wrote: >>> >>>> See below. >>>> >>>> Dennis >>>> >>>> >>>> On 08/06/2012 02:48 PM, Phil Steitz wrote: >>>>> >>>>> >>>>> >>>>> On Aug 5, 2012, at 11:21 PM, Dennis >>>>> Hendriks<d.hendr...@tue.nl> wrote: >>>>> >>>>>> See below. >>>>>> >>>>>> On 08/06/2012 12:49 AM, Gilles Sadowski wrote: >>>>>>> On Sun, Aug 05, 2012 at 12:54:11PM -0700, Phil Steitz wrote: >>>>>>>> On 8/4/12 10:57 AM, Gilles Sadowski wrote: >>>>>>>>> Hello. >>>>>>>>> >>>>>>>>> Referring to this failed test (cf. messages from Continuum): >>>>>>>>> ---CUT--- >>>>>>>>> org.apache.commons.math3.exception.NumberIsTooLargeException: >>>>>>>>> lower bound (65) must be strictly less than upper bound (65) >>>>>>>>> at >>>>>>>>> org.apache.commons.math3.distribution.UniformIntegerDistribution.<init>(UniformIntegerDistribution.java:73) >>>>>>>>> at >>>>>>>>> org.apache.commons.math3.distribution.UniformIntegerDistribution.<init>(UniformIntegerDistribution.java:53) >>>>>>>>> at >>>>>>>>> org.apache.commons.math3.stat.descriptive.AggregateSummaryStatisticsTest.generatePartition(AggregateSummaryStatisticsTest.java:275) >>>>>>>>> at >>>>>>>>> org.apache.commons.math3.stat.descriptive.AggregateSummaryStatisticsTest.testAggregationConsistency(AggregateSummaryStatisticsTest.java:89) >>>>>>>>> >>>>>>>>> >>>>>>>>> It is due to a precondition check while creating the >>>>>>>>> "UniformIntegerDistribution" instance: >>>>>>>>> ---CUT--- >>>>>>>>> if (lower>= upper) { >>>>>>>>> throw new NumberIsTooLargeException( >>>>>>>>> >>>>>>>>> LocalizedFormats.LOWER_BOUND_NOT_BELOW_UPPER_BOUND, >>>>>>>>> lower, upper, false); >>>>>>>>> } >>>>>>>>> ---CUT--- >>>>>>>>> >>>>>>>>> The test referred to above was using this code (before I >>>>>>>>> changed it use a >>>>>>>>> "UniformIntegerDistribution" instance): >>>>>>>>> ---CUT--- >>>>>>>>> final int next = (i == 4 || cur == length - 1) ? length - 1 >>>>>>>>> : randomData.nextInt(cur, length - 1); >>>>>>>>> ---CUT--- >>>>>>>>> >>>>>>>>> It is now (after the change): >>>>>>>>> ---CUT--- >>>>>>>>> final IntegerDistribution partitionPoint = new >>>>>>>>> UniformIntegerDistribution(cur, length - 1); >>>>>>>>> final int next = (i == 4 || cur == length - 1) ? length - 1 >>>>>>>>> : partitionPoint.sample(); >>>>>>>>> ---CUT--- >>>>>>>>> >>>>>>>>> Thus, AFAIK, the failure did not appear before because >>>>>>>>> there was no >>>>>>>>> precondition enforcement in "nextInt". >>>>>>>>> >>>>>>>>> The question is: Was the code in the test correct (in >>>>>>>>> allowing the same >>>>>>>>> value for both bounds? >>>>>>>>> * In the negative, how to change it? >>>>>>>>> * The affirmative would mean that the precondition check in >>>>>>>>> "UniformIntegerDistribution" should be relaxed to allow >>>>>>>>> equal bounds. >>>>>>>>> Does this make sense? >>>>>>>>> If so, can we change it now, or is it forbidden in >>>>>>>>> order to stay >>>>>>>>> backwards compatible? >>>>>>>> Your analysis above is correct. The failure after the >>>>>>>> change is due >>>>>>>> to the fact that post-change the distribution is >>>>>>>> instantiated before >>>>>>>> the bounds check. I changed the test to fix this. >>>>>>> Thanks. >>>>>>> >>>>>>>> Both the >>>>>>>> randomData nextInt and the UniformIntegerDistribution >>>>>>>> constructor >>>>>>>> now forbid the degenerate case where there is only one point >>>>>>>> in the >>>>>>>> domain. In retrospect, I guess it would have probably been >>>>>>>> better >>>>>>>> to allow this degenerate case. Unfortunately, this would be an >>>>>>>> incompatible change, so will have to wait until 4.0 if we >>>>>>>> want to do it. >>>>>>>> >>>>>>>> The original code above illustrates the convenience of being >>>>>>>> able to >>>>>>>> just make direct calls to randomData.nextXxx, which is why this >>>>>>>> class exists ;) >>>>>>> As I wrote in another post, I'm not against the convenience >>>>>>> methods. But >>>>>>> IMO, they should be located in a new "DistributionUtils" class. >>>>>>> And we should also find a way to remove the code duplication >>>>>>> (in the >>>>>>> distribution's "sample()" method and in the corresponding >>>>>>> "next..." method). >>>>>>> >>>>>> The RandomData class (or whatever it would be called) does >>>>>> indeed seem useful. If we plan to keep it, we should probably >>>>>> make sure that there is a sample/next/... method in that class >>>>>> for EVERY distribution, as some of them are missing, if I >>>>>> remember correctly. Perhaps this is a separate issue though? >>>>>> >>>>> All have the method now, but the impls delegate to >>>>> RandomDataImpl. In some cases, there is nothing better >>>>> implemented than just inversion, provided by the default >>>>> inversion sampler. That is OK. What we need to do is just >>>>> move the implementations of the default and specialized >>>>> samplers to the actual distribution classes. These can't be >>>>> static, as they use the RamdomData instance. I will take care >>>>> of this. >>>> I'm not sure if I made myself clear. I meant to say that not all >>>> distributions have a corresponding nextX method in >>>> RandomData(Impl). What I propose is to make sure that for every >>>> distribution class, there is a corresponding method in >>>> RandomData(Impl) to make sure that RandomData(Impl) is actually >>>> a substitute for using the distributions. >>> I get it now, but I don't think I agree. RandomData is meant to >>> be a general-purpose class used for generating random data with >>> commonly desired characteristics, like coming from commonly used >>> distributions such as Poisson, Gaussian, etc. This class >>> predates the sample() method that has been added to *all* >>> distributions. The default implementation of sampling for real >>> distributions (inversion-based sampling) has no dependency on >>> RandomDataImpl, but the specialized implementations (impls that >>> are better than inversion) for some distributions still live >>> there. Here is what I would like to do: >>> >>> 1. Move the specialized implementations from RandomDataImpl to >>> the distributions that they sample from. >>> >>> 2. Rename RandomDataImpl to merge the impl and the interface >>> >>> I think we all agree on 1. Regarding 2, I would personally >>> prefer to leave the represented distributions as is, sticking >>> with just the most commonly used distributions, having the >>> methods accept parameters, but maintaining generators as instance >>> variables so generator state can be maintained between calls >>> (like sample() and the existing RandomDataImpl behave now). If >>> others feel strongly that every distribution should be >>> represented, I am OK with that but would see it as clutter in the >>> RD replacement. >> I definitely do agree on 1. I also agree on 2 (the rename/merge). >> However, I thought the goal of having RandomData is to be convenient: >> >> - Distributions require a separate instance per distributions. >> RandomData requires only a single instance. >> >> - Distributions require a separate instance per combination of >> parameters. RandomData allows providing the parameters when one >> actually asks for samples. >> >> If so, then I don't see why that would be the case for only some >> of the distributions, and not for others...? Am I missing >> something here? > Only that adding every distribution would clutter the class, IMO. > As I said above, RandomData was originally intended to be just a > sort of souped-up Random, making the underlying PRNG pluggable and > supporting more kinds of random data generation than what Random > provides. If you feel strongly that there are lots of practical > applications for generating random data from, say Weibull or > ChiSquare distributions, I am fine adding them all to the RandomData > replacement. The rationale for limiting is that it makes it a > little easier to find the commonly used ones (exponential, poissson, > gaussian). If others disagree, I am OK adding them all (as I see > they have now been added to RandomDataImpl, but not the interface).
In r1375192, I created RandomDataGenerator to replace RandomDataImpl and included support for all distributions. I also changed the versions that were just using nextInversionDeviate to actually use the configured RandomGenerator. Phil > > Phil >> Dennis >> >> >>> Phil >>> >>> >>>> Obviously, the implementation should be only in the >>>> distributions OR in the RandomDataImpl. I agree that in the >>>> distributions would be better. I like the static methods in >>>> distributions idea. >>>> >>>>> Phil >>>>> >>>>>>> Gilles >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>>>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> >>>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>>>> >>>>> --------------------------------------------------------------------- >>>>> >>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>>> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> For additional commands, e-mail: dev-h...@commons.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org