On Mon, Jul 13, 2015 at 3:51 PM, Gilles <gil...@harfang.homelinux.org>
wrote:
> On Mon, 13 Jul 2015 06:30:44 -0700, Phil Steitz wrote:
>>
>> On 7/12/15 9:45 PM, Otmar Ertl wrote:
>>>
>>> On Sun, Jul 12, 2015 at 8:16 PM, Gilles <gil...@harfang.homelinux.org>
>>> wrote:
>>>>
>>>> On Sun, 12 Jul 2015 19:38:45 +0200, Thomas Neidhart wrote:
>>>>>
>>>>> On 07/12/2015 04:58 PM, Phil Steitz wrote:
>>>>>>
>>>>>> On 7/12/15 2:50 AM, Thomas Neidhart wrote:
>>>>>>>
>>>>>>> On 07/11/2015 09:43 PM, Phil Steitz wrote:
>>>>>>>>
>>>>>>>> On 7/11/15 12:29 PM, Thomas Neidhart wrote:
>>>>>>>>>
>>>>>>>>> On 07/11/2015 09:08 PM, Phil Steitz wrote:
>>>>>>>>>>
>>>>>>>>>> The code implemented in MATH-1242 to improve performance of KS
>>>>>>>>>> monteCarloP in-lines efficient generation of random boolean
>>>>>>>>>> arrays.
>>>>>>>>>> Unfortunately, I think the implementation is not quite random
>>>>>>>>>> (see
>>>>>>>>>> comments on the ticket). To verify it, we need to be able to test
>>>>>>>>>> the random boolean array generation directly. To do that, we have
>>>>>>>>>> to either expose the method (at least as protected) in the KS
>>>>>>>>>> class
>>>>>>>>>> or add it somewhere else. I propose the latter but am not sure
>>>>>>>>>> where to put it. For speed, we need to avoid array copies, so the
>>>>>>>>>> API will have to be something like randomize(boolean[], nTrue).
>>>>>>>>>> It
>>>>>>>>>> could go in the swelling MathArrays class, or
RandomDataGenerator.
>>>>>>>>>> The latter probably makes more sense, but the API does not fit
too
>>>>>>>>>> well. Any ideas?
>>>>>>>>>
>>>>>>>>> If it is just for testing purposes, you can also access the method
>>>>>>>>> in
>>>>>>>>> question via reflection, see an example here:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
http://stackoverflow.com/questions/34571/how-to-test-a-class-that-has-private-methods-fields-or-inner-classes
>>>>>>>>
>>>>>>>> Do you think it *should* be a private method of the K-S class?
>>>>>>>
>>>>>>> Right now, I do not see much uses outside the class, but if we
decide
>>>>>>> to
>>>>>>> make it public then I would prefer a special util class in the
random
>>>>>>> package to avoid cluttering the MathArrays class.
>>>>>>
>>>>>>
>>>>>> OK, for now we can make it private and use the reflection method
>>>>>> above to test it.
>>>>>
>>>>>
>>>>> ok, but I guess it is also fine to make it package private as sebb
>>>>> suggested. We did something similar recently for some of the improved
>>>>> sampling methods provided by Otmar.
>>>>>
>>>>>>> Regarding the RandomDataGenerator: I think this class should be
>>>>>>> deprecated and replaced by a Sampler interface as proposed by
Gilles.
>>>>>>
>>>>>>
>>>>>> Please consider keeping this class. Consider this a user request.
>>>>>> I have quite a few applications that use this class for two reasons:
>>>>>
>>>>>
>>>>> ok, the reason why I thought the class should be deprecated is because
>>>>> it was not kept up-to-date with all the new discrete and continuous
>>>>> distributions that we added in the last 2-3 years. If you think it is
>>>>> useful, then we can keep it of course.
>>>>>
>>>>>> 1. One object instance tied to one PRNG that generates data from
>>>>>> multiple different distributions. This is convenient. Sure, I
>>>>>> could refactor all of these apps to instantiate new objects for each
>>>>>> type of generated data and hopefully still be able to peg them to
>>>>>> one PRNG; but that is needless work that also complicates the code.
>>>>>>
>>>>>> 2. There are quite a few methods in this class that have nothing to
>>>>>> do with sampling (nextPermutation, nextHexString, nextSecureXxx,
>>>>>> etc) but which conveniently share the RandomGenerator. I guess the
>>>>>> utility methods get moved out somewhere else. Again, I end up
>>>>>> having to refactor all of my code that uses them and when I want
>>>>>> simulations to be based on a single PRNG, I have to find a way to
>>>>>> pass the RandomGenerator around to them.
>>>>>>
>>>>>> I don't yet see the need to refactor the sampling support in the
>>>>>> distributions package; but as my own apps are not impacted by this,
>>>>>> if everyone else sees the user impact of the refactoring as
>>>>>> outweighed by the benefit, I won't stand in the way. Please lets
>>>>>> just keep the RandomDataGenerator convenience class in the random
>>>>>> package in any case. I will take care of whatever adjustments are
>>>>>> needed to adapt to whatever we settle on for sampling in the
>>>>>> distributions package.
>>>>>
>>>>>
>>>>> Well, it is not really necessary to do everything together and
refactor
>>>>> the distributions.
>>>>>
>>>>> Probably it is better to start the other way round, and describe what
I
>>>>> want to add, and see how other things fit in:
>>>>>
>>>>> * I want a generic Sampler interface, i.e. something like this:
>>>>> ** nextSample()
>>>>> ** nextSamples(int size)
>>>>> ** nextSamples(double[] samples)
>>>>
>>>>
>>>> +1
>>>>
Do we really need all those 3 methods? If the functionality provided by the
two latter methods is essential for convenience reasons, we could also
offer utility functions that are able to fill an array with random numbers
from a given sampler, e.g. MathArrays.fill(array, sampler) or
array=MathArrays.generate(sampler, size).

>>>>> there could be a DiscreteSampler and ContinuousSampler interface to
>>>>> handle the cases for int / double values.
>>>>
>>>>
>>>> Perhaps the name should be IntegerSampler and DoubleSampler, to
>>>> accomodate
>>>> future needs (LongSampler, BooleanSampler (?)).
>>>
>>> I would prefer consistent names for distributions and corresponding
>>> samplers. The support of distributions must be of the same data type
>>> as the return values of the corresponding sampler. Therefore, I would
>>> call the samplers for RealDistribution and IntegerDistribution
>>> RealSampler and IntegerSampler, respectively.
>
>
> Ideally yes. But there is always the lingering question of what "Real"
> means: the mathematical abstraction or the numerical representation?
> The same "real" distributions could be implemented with "float"s.
> If we ever need implementing a discrete distribution that is able to
> use the "long" range, the "Integer" in "IntegerSampler" will clearly
> refer to the numerical type, not the mathematical abstraction of an
> "integer" number...
>
> Gilles
>
>

The underlying data type of RealDistribution implementations is implicitly
defined to be "double". It would not make sense that methods such as
getSupportLowerBound() return a different type than the associated sample
method. Therefore, in the context of RealDistribution "real" means "double"
at the moment. However, I am open for renaming RealDistribution as
DoubleDistribution with a sampler named DoubleSampler. If a float-based
implementation is really ever needed, interfaces named FloatDistribution
and FloatSampler could be introduced.
>

>> +1
>>
>> Phil
>>>
>>>
>>>>> The distributions could either be changed to return such a sampler as
>>>>> Gilles proposed (with the advantage that no random instance is tied to
>>>>> the distribution itself), or implement the interface directly (with
the
>>>>> advantage that we would not need to refactor too much).
>>>>
>>>>
>>>> Unless I'm missing something, the refactoring would be fairly the same:
>>>> The latter case needs implementing 3 methods (2 new ones, one with a
>>>> name
>>>> change).
>>>> The former needs implementing the factory method proposed in MATH-1158,
>>>> plus the same methods as above (wrapped in the object returned by the
>>>> factory method).
>>>>
>>>>
>>>> Gilles
>>>>
>>>>
>>>>
>>>>>>> One can then create a sampler for any distribution or from other
>>>>>>> sources, e.g. when needing a fast and efficient sampler without
>>>>>>> replacement (see MATH-1239).
>>>>>>
>>>>>>
>>>>>> +1 for sequential sampling. I don't follow exactly why that
>>>>>> requires refactoring the distributions; but if it helps in a way I
>>>>>> don't yet understand, that will help convince me that refactoring
>>>>>> sampling in the distributions package is worth the user pain.
>>>>>
>>>>>
>>>>> as I said above, I wanted to combine two things in one step, maybe it
>>>>> is
>>>>> better to go step by step.
>>>>>
>>>>> Thomas
>>>>
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>>>> For additional commands, e-mail: dev-h...@commons.apache.org
>>>>
>>> Otmar
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>>> For additional commands, e-mail: dev-h...@commons.apache.org
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>> For additional commands, e-mail: dev-h...@commons.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>

Otmar

Reply via email to