On Mon, 13 Jul 2015 06:30:44 -0700, Phil Steitz wrote:
On 7/12/15 9:45 PM, Otmar Ertl wrote:
On Sun, Jul 12, 2015 at 8:16 PM, Gilles <gil...@harfang.homelinux.org> wrote:
On Sun, 12 Jul 2015 19:38:45 +0200, Thomas Neidhart wrote:
On 07/12/2015 04:58 PM, Phil Steitz wrote:
On 7/12/15 2:50 AM, Thomas Neidhart wrote:
On 07/11/2015 09:43 PM, Phil Steitz wrote:
On 7/11/15 12:29 PM, Thomas Neidhart wrote:
On 07/11/2015 09:08 PM, Phil Steitz wrote:
The code implemented in MATH-1242 to improve performance of KS monteCarloP in-lines efficient generation of random boolean arrays. Unfortunately, I think the implementation is not quite random (see comments on the ticket). To verify it, we need to be able to test the random boolean array generation directly. To do that, we have to either expose the method (at least as protected) in the KS class or add it somewhere else. I propose the latter but am not sure where to put it. For speed, we need to avoid array copies, so the API will have to be something like randomize(boolean[], nTrue). It could go in the swelling MathArrays class, or RandomDataGenerator. The latter probably makes more sense, but the API does not fit too
well.  Any ideas?
If it is just for testing purposes, you can also access the method in
question via reflection, see an example here:



http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes
Do you think it *should* be a private method of the K-S class?
Right now, I do not see much uses outside the class, but if we decide to make it public then I would prefer a special util class in the random
package to avoid cluttering the MathArrays class.

OK, for now we can make it private and use the reflection method
above to test it.

ok, but I guess it is also fine to make it package private as sebb
suggested. We did something similar recently for some of the improved
sampling methods provided by Otmar.

Regarding the RandomDataGenerator: I think this class should be
deprecated and replaced by a Sampler interface as proposed by Gilles.

Please consider keeping this class. Consider this a user request. I have quite a few applications that use this class for two reasons:

ok, the reason why I thought the class should be deprecated is because it was not kept up-to-date with all the new discrete and continuous distributions that we added in the last 2-3 years. If you think it is
useful, then we can keep it of course.

1.  One object instance tied to one PRNG that generates data from
multiple different distributions.   This is convenient.   Sure, I
could refactor all of these apps to instantiate new objects for each
type of generated data and hopefully still be able to peg them to
one PRNG; but that is needless work that also complicates the code.

2. There are quite a few methods in this class that have nothing to
do with sampling (nextPermutation, nextHexString, nextSecureXxx,
etc) but which conveniently share the RandomGenerator. I guess the
utility methods get moved out somewhere else.  Again, I end up
having to refactor all of my code that uses them and when I want
simulations to be based on a single PRNG, I have to find a way to
pass the RandomGenerator around to them.

I don't yet see the need to refactor the sampling support in the
distributions package; but as my own apps are not impacted by this,
if everyone else sees the user impact of the refactoring as
outweighed by the benefit, I won't stand in the way. Please lets
just keep the RandomDataGenerator convenience class in the random
package in any case. I will take care of whatever adjustments are
needed to adapt to whatever we settle on for sampling in the
distributions package.

Well, it is not really necessary to do everything together and refactor
the distributions.

Probably it is better to start the other way round, and describe what I
want to add, and see how other things fit in:

 * I want a generic Sampler interface, i.e. something like this:
 ** nextSample()
 ** nextSamples(int size)
 ** nextSamples(double[] samples)

+1

there could be a DiscreteSampler and ContinuousSampler interface to
handle the cases for int / double values.

Perhaps the name should be IntegerSampler and DoubleSampler, to accomodate
future needs (LongSampler, BooleanSampler (?)).
I would prefer consistent names for distributions and corresponding
samplers. The support of distributions must be of the same data type
as the return values of the corresponding sampler. Therefore, I would
call the samplers for RealDistribution and IntegerDistribution
RealSampler and IntegerSampler, respectively.

Ideally yes.  But there is always the lingering question of what "Real"
means: the mathematical abstraction or the numerical representation?
The same "real" distributions could be implemented with "float"s.
If we ever need implementing a discrete distribution that is able to
use the "long" range, the "Integer" in "IntegerSampler" will clearly
refer to the numerical type, not the mathematical abstraction of an
"integer" number...

Gilles


+1

Phil

The distributions could either be changed to return such a sampler as Gilles proposed (with the advantage that no random instance is tied to the distribution itself), or implement the interface directly (with the
advantage that we would not need to refactor too much).

Unless I'm missing something, the refactoring would be fairly the same: The latter case needs implementing 3 methods (2 new ones, one with a name
change).
The former needs implementing the factory method proposed in MATH-1158, plus the same methods as above (wrapped in the object returned by the
factory method).


Gilles



One can then create a sampler for any distribution or from other
sources, e.g. when needing a fast and efficient sampler without
replacement (see MATH-1239).

+1 for sequential sampling.  I don't follow exactly why that
requires refactoring the distributions; but if it helps in a way I
don't yet understand, that will help convince me that refactoring
sampling in the distributions package is worth the user pain.

as I said above, I wanted to combine two things in one step, maybe it is
better to go step by step.

Thomas



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Otmar


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to