On 7/12/15 2:50 AM, Thomas Neidhart wrote:
> On 07/11/2015 09:43 PM, Phil Steitz wrote:
>> On 7/11/15 12:29 PM, Thomas Neidhart wrote:
>>> On 07/11/2015 09:08 PM, Phil Steitz wrote:
>>>> The code implemented in MATH-1242 to improve performance of KS
>>>> monteCarloP in-lines efficient generation of random boolean arrays.
>>>>   Unfortunately, I think the implementation is not quite random (see
>>>> comments on the ticket).  To verify it, we need to be able to test
>>>> the random boolean array generation directly.  To do that, we have
>>>> to either expose the method (at least as protected) in the KS class
>>>> or add it somewhere else.  I propose the latter but am not sure
>>>> where to put it.  For speed, we need to avoid array copies, so the
>>>> API will have to be something like randomize(boolean[], nTrue).  It
>>>> could go in the swelling MathArrays class, or RandomDataGenerator. 
>>>> The latter probably makes more sense, but the API does not fit too
>>>> well.  Any ideas?
>>> If it is just for testing purposes, you can also access the method in
>>> question via reflection, see an example here:
>>> http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes
>> Do you think it *should* be a private method of the K-S class?
> Right now, I do not see much uses outside the class, but if we decide to
> make it public then I would prefer a special util class in the random
> package to avoid cluttering the MathArrays class.

OK, for now we can make it private and use the reflection method
above to test it.
>
> Regarding the RandomDataGenerator: I think this class should be
> deprecated and replaced by a Sampler interface as proposed by Gilles.

Please consider keeping this class.  Consider this a user request. 
I have quite a few applications that use this class for two reasons:

1.  One object instance tied to one PRNG that generates data from
multiple different distributions.   This is convenient.   Sure, I
could refactor all of these apps to instantiate new objects for each
type of generated data and hopefully still be able to peg them to
one PRNG; but that is needless work that also complicates the code.

2.  There are quite a few methods in this class that have nothing to
do with sampling (nextPermutation, nextHexString, nextSecureXxx,
etc) but which conveniently share the RandomGenerator.  I guess the
utility methods get moved out somewhere else.  Again, I end up
having to refactor all of my code that uses them and when I want
simulations to be based on a single PRNG, I have to find a way to
pass the RandomGenerator around to them.

I don't yet see the need to refactor the sampling support in the
distributions package; but as my own apps are not impacted by this,
if everyone else sees the user impact of the refactoring as
outweighed by the benefit, I won't stand in the way.   Please lets
just keep the RandomDataGenerator convenience class in the random
package in any case.  I will take care of whatever adjustments are
needed to adapt to whatever we settle on for sampling in the
distributions package.

> One can then create a sampler for any distribution or from other
> sources, e.g. when needing a fast and efficient sampler without
> replacement (see MATH-1239).

+1 for sequential sampling.  I don't follow exactly why that
requires refactoring the distributions; but if it helps in a way I
don't yet understand, that will help convince me that refactoring
sampling in the distributions package is worth the user pain.  

Phil
>
> I did not have time to complete a patch, but am working on it
>
> Thomas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to