On 7/12/15 2:50 AM, Thomas Neidhart wrote: > On 07/11/2015 09:43 PM, Phil Steitz wrote: >> On 7/11/15 12:29 PM, Thomas Neidhart wrote: >>> On 07/11/2015 09:08 PM, Phil Steitz wrote: >>>> The code implemented in MATH-1242 to improve performance of KS >>>> monteCarloP in-lines efficient generation of random boolean arrays. >>>> Unfortunately, I think the implementation is not quite random (see >>>> comments on the ticket). To verify it, we need to be able to test >>>> the random boolean array generation directly. To do that, we have >>>> to either expose the method (at least as protected) in the KS class >>>> or add it somewhere else. I propose the latter but am not sure >>>> where to put it. For speed, we need to avoid array copies, so the >>>> API will have to be something like randomize(boolean[], nTrue). It >>>> could go in the swelling MathArrays class, or RandomDataGenerator. >>>> The latter probably makes more sense, but the API does not fit too >>>> well. Any ideas? >>> If it is just for testing purposes, you can also access the method in >>> question via reflection, see an example here: >>> http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes >> Do you think it *should* be a private method of the K-S class? > Right now, I do not see much uses outside the class, but if we decide to > make it public then I would prefer a special util class in the random > package to avoid cluttering the MathArrays class.
OK, for now we can make it private and use the reflection method above to test it. > > Regarding the RandomDataGenerator: I think this class should be > deprecated and replaced by a Sampler interface as proposed by Gilles. Please consider keeping this class. Consider this a user request. I have quite a few applications that use this class for two reasons: 1. One object instance tied to one PRNG that generates data from multiple different distributions. This is convenient. Sure, I could refactor all of these apps to instantiate new objects for each type of generated data and hopefully still be able to peg them to one PRNG; but that is needless work that also complicates the code. 2. There are quite a few methods in this class that have nothing to do with sampling (nextPermutation, nextHexString, nextSecureXxx, etc) but which conveniently share the RandomGenerator. I guess the utility methods get moved out somewhere else. Again, I end up having to refactor all of my code that uses them and when I want simulations to be based on a single PRNG, I have to find a way to pass the RandomGenerator around to them. I don't yet see the need to refactor the sampling support in the distributions package; but as my own apps are not impacted by this, if everyone else sees the user impact of the refactoring as outweighed by the benefit, I won't stand in the way. Please lets just keep the RandomDataGenerator convenience class in the random package in any case. I will take care of whatever adjustments are needed to adapt to whatever we settle on for sampling in the distributions package. > One can then create a sampler for any distribution or from other > sources, e.g. when needing a fast and efficient sampler without > replacement (see MATH-1239). +1 for sequential sampling. I don't follow exactly why that requires refactoring the distributions; but if it helps in a way I don't yet understand, that will help convince me that refactoring sampling in the distributions package is worth the user pain. Phil > > I did not have time to complete a patch, but am working on it > > Thomas > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org