On 7/12/15 9:45 PM, Otmar Ertl wrote: > On Sun, Jul 12, 2015 at 8:16 PM, Gilles <gil...@harfang.homelinux.org> wrote: >> On Sun, 12 Jul 2015 19:38:45 +0200, Thomas Neidhart wrote: >>> On 07/12/2015 04:58 PM, Phil Steitz wrote: >>>> On 7/12/15 2:50 AM, Thomas Neidhart wrote: >>>>> On 07/11/2015 09:43 PM, Phil Steitz wrote: >>>>>> On 7/11/15 12:29 PM, Thomas Neidhart wrote: >>>>>>> On 07/11/2015 09:08 PM, Phil Steitz wrote: >>>>>>>> The code implemented in MATH-1242 to improve performance of KS >>>>>>>> monteCarloP in-lines efficient generation of random boolean arrays. >>>>>>>> Unfortunately, I think the implementation is not quite random (see >>>>>>>> comments on the ticket). To verify it, we need to be able to test >>>>>>>> the random boolean array generation directly. To do that, we have >>>>>>>> to either expose the method (at least as protected) in the KS class >>>>>>>> or add it somewhere else. I propose the latter but am not sure >>>>>>>> where to put it. For speed, we need to avoid array copies, so the >>>>>>>> API will have to be something like randomize(boolean[], nTrue). It >>>>>>>> could go in the swelling MathArrays class, or RandomDataGenerator. >>>>>>>> The latter probably makes more sense, but the API does not fit too >>>>>>>> well. Any ideas? >>>>>>> If it is just for testing purposes, you can also access the method in >>>>>>> question via reflection, see an example here: >>>>>>> >>>>>>> >>>>>>> http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes >>>>>> Do you think it *should* be a private method of the K-S class? >>>>> Right now, I do not see much uses outside the class, but if we decide to >>>>> make it public then I would prefer a special util class in the random >>>>> package to avoid cluttering the MathArrays class. >>>> >>>> OK, for now we can make it private and use the reflection method >>>> above to test it. >>> >>> ok, but I guess it is also fine to make it package private as sebb >>> suggested. We did something similar recently for some of the improved >>> sampling methods provided by Otmar. >>> >>>>> Regarding the RandomDataGenerator: I think this class should be >>>>> deprecated and replaced by a Sampler interface as proposed by Gilles. >>>> >>>> Please consider keeping this class. Consider this a user request. >>>> I have quite a few applications that use this class for two reasons: >>> >>> ok, the reason why I thought the class should be deprecated is because >>> it was not kept up-to-date with all the new discrete and continuous >>> distributions that we added in the last 2-3 years. If you think it is >>> useful, then we can keep it of course. >>> >>>> 1. One object instance tied to one PRNG that generates data from >>>> multiple different distributions. This is convenient. Sure, I >>>> could refactor all of these apps to instantiate new objects for each >>>> type of generated data and hopefully still be able to peg them to >>>> one PRNG; but that is needless work that also complicates the code. >>>> >>>> 2. There are quite a few methods in this class that have nothing to >>>> do with sampling (nextPermutation, nextHexString, nextSecureXxx, >>>> etc) but which conveniently share the RandomGenerator. I guess the >>>> utility methods get moved out somewhere else. Again, I end up >>>> having to refactor all of my code that uses them and when I want >>>> simulations to be based on a single PRNG, I have to find a way to >>>> pass the RandomGenerator around to them. >>>> >>>> I don't yet see the need to refactor the sampling support in the >>>> distributions package; but as my own apps are not impacted by this, >>>> if everyone else sees the user impact of the refactoring as >>>> outweighed by the benefit, I won't stand in the way. Please lets >>>> just keep the RandomDataGenerator convenience class in the random >>>> package in any case. I will take care of whatever adjustments are >>>> needed to adapt to whatever we settle on for sampling in the >>>> distributions package. >>> >>> Well, it is not really necessary to do everything together and refactor >>> the distributions. >>> >>> Probably it is better to start the other way round, and describe what I >>> want to add, and see how other things fit in: >>> >>> * I want a generic Sampler interface, i.e. something like this: >>> ** nextSample() >>> ** nextSamples(int size) >>> ** nextSamples(double[] samples) >> >> +1 >> >>> there could be a DiscreteSampler and ContinuousSampler interface to >>> handle the cases for int / double values. >> >> Perhaps the name should be IntegerSampler and DoubleSampler, to accomodate >> future needs (LongSampler, BooleanSampler (?)). > I would prefer consistent names for distributions and corresponding > samplers. The support of distributions must be of the same data type > as the return values of the corresponding sampler. Therefore, I would > call the samplers for RealDistribution and IntegerDistribution > RealSampler and IntegerSampler, respectively.
+1 Phil > >>> The distributions could either be changed to return such a sampler as >>> Gilles proposed (with the advantage that no random instance is tied to >>> the distribution itself), or implement the interface directly (with the >>> advantage that we would not need to refactor too much). >> >> Unless I'm missing something, the refactoring would be fairly the same: >> The latter case needs implementing 3 methods (2 new ones, one with a name >> change). >> The former needs implementing the factory method proposed in MATH-1158, >> plus the same methods as above (wrapped in the object returned by the >> factory method). >> >> >> Gilles >> >> >> >>>>> One can then create a sampler for any distribution or from other >>>>> sources, e.g. when needing a fast and efficient sampler without >>>>> replacement (see MATH-1239). >>>> >>>> +1 for sequential sampling. I don't follow exactly why that >>>> requires refactoring the distributions; but if it helps in a way I >>>> don't yet understand, that will help convince me that refactoring >>>> sampling in the distributions package is worth the user pain. >>> >>> as I said above, I wanted to combine two things in one step, maybe it is >>> better to go step by step. >>> >>> Thomas >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> For additional commands, e-mail: dev-h...@commons.apache.org >> > Otmar > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org