> On 4 May 2019, at 22:34, Gilles Sadowski <gillese...@gmail.com> wrote:
> 
> Hi.
> 
> Le sam. 4 mai 2019 à 21:31, Alex Herbert <alex.d.herb...@gmail.com> a écrit :
>> 
>> 
>> 
>>> On 4 May 2019, at 14:46, Gilles Sadowski <gillese...@gmail.com> wrote:
>>> 
>>> Hello.
>>> 
>>> Le ven. 3 mai 2019 à 16:57, Alex Herbert <alex.d.herb...@gmail.com 
>>> <mailto:alex.d.herb...@gmail.com>> a écrit :
>>>> 
>>>> Most of the samplers in the library have very small states that are easy
>>>> to compute. Some have computations that are more expensive, such as the
>>>> LargeMeanPoissonSampler or the DiscreteProbabilityCollectionSampler.
>>>> 
>>>> However once the state is computed the only part of the state that
>>>> changes is the RNG. I would like to suggest a way to copy samplers as
>>>> something like:
>>>> 
>>>> DiscreteSampler newInstance(UniformRandomProvider)
>>>> 
>>>> The new instance would share all the private state of the first sampler
>>>> except the RNG. This can be used for multi-threaded applications which
>>>> require a new sampler per thread but sample from the same distribution.
>>>> 
>>>> A particular case in point is the as yet not integrated
>>>> MarsagliaTsangWangSmallMeanPoissonSampler (see RNG-91 [1]) which has a
>>>> "large" state [2] that takes a "long" time [3] to compute but is
>>>> effectively immutable. This could be shared across instances saving
>>>> memory for parallel application.
>>>> 
>>>> A copy instance would be almost zero set-up time and provide opportunity
>>>> for caching of commonly used samplers.
>>> 
>>> The goal is sharing (immutable) state so it seems that the semantics is
>>> not "copy".
>>> 
>>> Isn't it a "factory" that we are after?  E.g. something like:
>>> public final class CachedSamplingFactory {
>>>   private static PoissonSamplerCache poisson = new PoissonSamplerCache();
>>> 
>>>   public PoissonSampler createPoissonSampler(UniformRandomProvider
>>> rng, double mean) {
>>>       if (!poisson.isCached(mean)) {
>>>           poisson.createCache(mean); // Initialize (requires
>>> synchronization) ...
>>>       }
>>>       return new PoissonSampler(poisson.getCache(mean), rng); //
>>> Construct using pre-built state.
>>>   }
>>> }
>>> [It may be overkill, more work, and less performant…]
>> 
>> But you need a factory for every class you want to share state for. And the 
>> factory actually has to look in a cache. If you operate on an instance then 
>> you get what you want. Another working version of the same sampler. It would 
>> also be thread safe without synchronisation as long as the state is 
>> immutable. The only mutable state is the passed in RNG.
> 
> Agreed.  It was what I meant by the last sentence.
> 
>>> 
>>> IIUC, you suggest to add "newInstance" in the "DiscreatSampler" interface 
>>> (?).
>> 
>> I did think of extending DiscreteSampler with this functionality. Not adding 
>> to the interface as it currently is ‘functional’ as it has only one method. 
>> I think that should not change. Having thought about it a bit more I like 
>> the idea of a new functional interface. Perhaps:
>> 
>> interface DiscreteSamplerProvider {
>>    DiscreteSampler create(UniformRandomProvider rng);
>> }
>> 
>> Substitute ‘Provider’ for:
>> 
>> - Generator
>> - Supplier (possible clash or alignment with Java 8 depending on the way it 
>> is done)
>> - Factory (though the method is not static so I do not like this)
>> - etc
>> 
>> So this then becomes a functional interface that can be used by anything. 
>> However instances of a sampler would be expected to return a sampler 
>> matching their own functionality.
>> 
>> Note there are some samplers not implementing an interface that also could 
>> benefit from this. Namely CollectionSampler and 
>> DiscreteProbabilityCollectionSampler. So does this need a generic interface:
>> 
>> Sampler<T> {
>>    T sample();
>> }
>> 
>> To be complimented with:
>> 
>> SamplerProvider<T> {
>>    Sampler<T> create(UniformRandomProvider rng);
>> }
>> 
>> So the library would require:
>> 
>> SamplerProvider<T>
>> DiscreteSamplerProvider
>> ContinuousSamplerProvider
>> 
>> Any sampler can choose to implement being a Provider. There are some cases 
>> where it is mute. For example a ZigguratNormalizedGaussianSampler just 
>> stores the rng in the constructor. However it could still be a Provider just 
>> the method would only call the constructor. It would allow writing a generic 
>> multi-threaded application that just uses e.g. a DiscreteSamplerProvider to 
>> create samplers for each thread. You can then drop in the actual 
>> implementation you require. For example you could swap the type of 
>> PoissonSampler in your simulation by swapping the provider for the Poisson 
>> distribution.
>> 
>> How does that sound?
> 
> Fine to have
>  DiscreteSamplerProvider
>  ContinuousSamplerProvider
> [Perhaps the "Supplier" suffix would be better to avoid confusion with
> "UniformRandomProvider".]
> 
> At first sight, I don't think that the generic interface would have
> any actual use since, ultimately, the return value of "sample()"
> will be either "int" or "double" (no polymorphism).
> 

The generic interface is for the samplers that are typed for collections and 
currently return a sample T, or those that return arrays. It would not be for 
Integer or Double from the probability distribution samplers. Here are what 
could use it:

CombinationSampler implements Sampler<int[]>
PermutationSampler implements Sampler<int[]>
CollectionSampler implements Sampler<T>
DiscreteProbabilityCollectionSampler implements Sampler<T>

All are in the package org.apache.commons.rng.sampling.

Each could also implement SamplerSupplier<T>.

The set-up cost for the CombinationSampler/PermutationSampler would not be much 
different from the constructor and no state can be shared. No real benefit here 
other than convenience. But the two CollectionSamplers could shared the final 
collection that is created and stored from the constructor input data. For the 
case of a large discrete probability collection sampler this could be a 
noticeable memory footprint as it also stores the cumulative distribution 
table. This would also save on the construction cost by not having to recompute 
it.

Alex


> Gilles
> 
>> 
>> Alex
>> 
>> 
>> 
>>> I'm a bit wary that this would compound two different functionalities:
>>> * data generator (method "sample"),
>>> * generator generator (method "newInstance").
>>> [But I currently don't have an example where this would be a problem.]
>>> 
>>> Regards,
>>> Gilles
>>> 
>>>> Alex
>>>> 
>>>> [1] https://issues.apache.org/jira/browse/RNG-91 
>>>> <https://issues.apache.org/jira/browse/RNG-91>
>>>> 
>>>> [2] kB, or possibly MB, of tabulated data
>>>> 
>>>> [3] Set-up cost for a Poisson sampler is in the order of 30 to 165 times
>>>> as long as a SmallMeanPoissonSampler for a mean of 2 and 32. Note
>>>> however that construction still takes only 1.1 and 4.5 microseconds for
>>>> the "long" time.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to