On Mon, Jul 13, 2015 at 3:51 PM, Gilles <gil...@harfang.homelinux.org> wrote: > On Mon, 13 Jul 2015 06:30:44 -0700, Phil Steitz wrote: >> >> On 7/12/15 9:45 PM, Otmar Ertl wrote: >>> >>> On Sun, Jul 12, 2015 at 8:16 PM, Gilles <gil...@harfang.homelinux.org> >>> wrote: >>>> >>>> On Sun, 12 Jul 2015 19:38:45 +0200, Thomas Neidhart wrote: >>>>> >>>>> On 07/12/2015 04:58 PM, Phil Steitz wrote: >>>>>> >>>>>> On 7/12/15 2:50 AM, Thomas Neidhart wrote: >>>>>>> >>>>>>> On 07/11/2015 09:43 PM, Phil Steitz wrote: >>>>>>>> >>>>>>>> On 7/11/15 12:29 PM, Thomas Neidhart wrote: >>>>>>>>> >>>>>>>>> On 07/11/2015 09:08 PM, Phil Steitz wrote: >>>>>>>>>> >>>>>>>>>> The code implemented in MATH-1242 to improve performance of KS >>>>>>>>>> monteCarloP in-lines efficient generation of random boolean >>>>>>>>>> arrays. >>>>>>>>>> Unfortunately, I think the implementation is not quite random >>>>>>>>>> (see >>>>>>>>>> comments on the ticket). To verify it, we need to be able to test >>>>>>>>>> the random boolean array generation directly. To do that, we have >>>>>>>>>> to either expose the method (at least as protected) in the KS >>>>>>>>>> class >>>>>>>>>> or add it somewhere else. I propose the latter but am not sure >>>>>>>>>> where to put it. For speed, we need to avoid array copies, so the >>>>>>>>>> API will have to be something like randomize(boolean[], nTrue). >>>>>>>>>> It >>>>>>>>>> could go in the swelling MathArrays class, or RandomDataGenerator. >>>>>>>>>> The latter probably makes more sense, but the API does not fit too >>>>>>>>>> well. Any ideas? >>>>>>>>> >>>>>>>>> If it is just for testing purposes, you can also access the method >>>>>>>>> in >>>>>>>>> question via reflection, see an example here: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes >>>>>>>> >>>>>>>> Do you think it *should* be a private method of the K-S class? >>>>>>> >>>>>>> Right now, I do not see much uses outside the class, but if we decide >>>>>>> to >>>>>>> make it public then I would prefer a special util class in the random >>>>>>> package to avoid cluttering the MathArrays class. >>>>>> >>>>>> >>>>>> OK, for now we can make it private and use the reflection method >>>>>> above to test it. >>>>> >>>>> >>>>> ok, but I guess it is also fine to make it package private as sebb >>>>> suggested. We did something similar recently for some of the improved >>>>> sampling methods provided by Otmar. >>>>> >>>>>>> Regarding the RandomDataGenerator: I think this class should be >>>>>>> deprecated and replaced by a Sampler interface as proposed by Gilles. >>>>>> >>>>>> >>>>>> Please consider keeping this class. Consider this a user request. >>>>>> I have quite a few applications that use this class for two reasons: >>>>> >>>>> >>>>> ok, the reason why I thought the class should be deprecated is because >>>>> it was not kept up-to-date with all the new discrete and continuous >>>>> distributions that we added in the last 2-3 years. If you think it is >>>>> useful, then we can keep it of course. >>>>> >>>>>> 1. One object instance tied to one PRNG that generates data from >>>>>> multiple different distributions. This is convenient. Sure, I >>>>>> could refactor all of these apps to instantiate new objects for each >>>>>> type of generated data and hopefully still be able to peg them to >>>>>> one PRNG; but that is needless work that also complicates the code. >>>>>> >>>>>> 2. There are quite a few methods in this class that have nothing to >>>>>> do with sampling (nextPermutation, nextHexString, nextSecureXxx, >>>>>> etc) but which conveniently share the RandomGenerator. I guess the >>>>>> utility methods get moved out somewhere else. Again, I end up >>>>>> having to refactor all of my code that uses them and when I want >>>>>> simulations to be based on a single PRNG, I have to find a way to >>>>>> pass the RandomGenerator around to them. >>>>>> >>>>>> I don't yet see the need to refactor the sampling support in the >>>>>> distributions package; but as my own apps are not impacted by this, >>>>>> if everyone else sees the user impact of the refactoring as >>>>>> outweighed by the benefit, I won't stand in the way. Please lets >>>>>> just keep the RandomDataGenerator convenience class in the random >>>>>> package in any case. I will take care of whatever adjustments are >>>>>> needed to adapt to whatever we settle on for sampling in the >>>>>> distributions package. >>>>> >>>>> >>>>> Well, it is not really necessary to do everything together and refactor >>>>> the distributions. >>>>> >>>>> Probably it is better to start the other way round, and describe what I >>>>> want to add, and see how other things fit in: >>>>> >>>>> * I want a generic Sampler interface, i.e. something like this: >>>>> ** nextSample() >>>>> ** nextSamples(int size) >>>>> ** nextSamples(double[] samples) >>>> >>>> >>>> +1 >>>> Do we really need all those 3 methods? If the functionality provided by the two latter methods is essential for convenience reasons, we could also offer utility functions that are able to fill an array with random numbers from a given sampler, e.g. MathArrays.fill(array, sampler) or array=MathArrays.generate(sampler, size).
>>>>> there could be a DiscreteSampler and ContinuousSampler interface to >>>>> handle the cases for int / double values. >>>> >>>> >>>> Perhaps the name should be IntegerSampler and DoubleSampler, to >>>> accomodate >>>> future needs (LongSampler, BooleanSampler (?)). >>> >>> I would prefer consistent names for distributions and corresponding >>> samplers. The support of distributions must be of the same data type >>> as the return values of the corresponding sampler. Therefore, I would >>> call the samplers for RealDistribution and IntegerDistribution >>> RealSampler and IntegerSampler, respectively. > > > Ideally yes. But there is always the lingering question of what "Real" > means: the mathematical abstraction or the numerical representation? > The same "real" distributions could be implemented with "float"s. > If we ever need implementing a discrete distribution that is able to > use the "long" range, the "Integer" in "IntegerSampler" will clearly > refer to the numerical type, not the mathematical abstraction of an > "integer" number... > > Gilles > > The underlying data type of RealDistribution implementations is implicitly defined to be "double". It would not make sense that methods such as getSupportLowerBound() return a different type than the associated sample method. Therefore, in the context of RealDistribution "real" means "double" at the moment. However, I am open for renaming RealDistribution as DoubleDistribution with a sampler named DoubleSampler. If a float-based implementation is really ever needed, interfaces named FloatDistribution and FloatSampler could be introduced. > >> +1 >> >> Phil >>> >>> >>>>> The distributions could either be changed to return such a sampler as >>>>> Gilles proposed (with the advantage that no random instance is tied to >>>>> the distribution itself), or implement the interface directly (with the >>>>> advantage that we would not need to refactor too much). >>>> >>>> >>>> Unless I'm missing something, the refactoring would be fairly the same: >>>> The latter case needs implementing 3 methods (2 new ones, one with a >>>> name >>>> change). >>>> The former needs implementing the factory method proposed in MATH-1158, >>>> plus the same methods as above (wrapped in the object returned by the >>>> factory method). >>>> >>>> >>>> Gilles >>>> >>>> >>>> >>>>>>> One can then create a sampler for any distribution or from other >>>>>>> sources, e.g. when needing a fast and efficient sampler without >>>>>>> replacement (see MATH-1239). >>>>>> >>>>>> >>>>>> +1 for sequential sampling. I don't follow exactly why that >>>>>> requires refactoring the distributions; but if it helps in a way I >>>>>> don't yet understand, that will help convince me that refactoring >>>>>> sampling in the distributions package is worth the user pain. >>>>> >>>>> >>>>> as I said above, I wanted to combine two things in one step, maybe it >>>>> is >>>>> better to go step by step. >>>>> >>>>> Thomas >>>> >>>> >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>> >>> Otmar >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>> For additional commands, e-mail: dev-h...@commons.apache.org >>> >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> For additional commands, e-mail: dev-h...@commons.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > Otmar