Re: [math] Thread-safety and initialization overhead in distribution sample() methods

Phil Steitz Wed, 08 Oct 2014 08:50:10 -0700

On 10/8/14 5:33 AM, Thomas Neidhart wrote:
> Hi,
>
> in my use-cases I also use only 1 random generator with a fixed seed which
> is important property as in case of simulations you want to have a
> deterministic behavior for all random decisions.


I tend to do the same thing.
>
> I am not sure if we should invest effort to make the existing random
> generators thread-safe, as there is already the SynchronizedRandomGenerator
> wrapper that makes any RandomGenerator instance thread-safe.e 

I agree here and I would add that when you think about it,
threadsafety is actually hard to *define* for PRNGs and devilishly
hard to implement if defined strictly.  If you interpret the contact
of the Well or Mersenne generators, for example, to be that they
generate sequences as defined by those algorithms, a thread-safe
implementation would have to satisfy the property that every
subsequence behaves like a directly generated sequence.   Note also
that concurrent access destroys determinism via fixed seeds.  So
while just slowing things down by adding syncs can avoid runtime
issues, what exactly it does to the random sequences and in
particular whether they have the properties of directly generated
sequences is not at all clear. 
> So a plain random generator instance is not thread-safe, but fast, while
> you can make every instance thread-safe by wrapping it, This is the
> flexibility that I would be looking for.
>
> Regarding the additional sample method:
>
> Personally, I would setup a distribution object with a given rng as it is
> more convenient to use afterwards as Phil pointed out, but I am also fine
> with a sample(RandomGenerator) method.

I agree here too.  I like Luc's idea (sorry if I am misconstruing
it) of making the no-arg sample() *require* that the distribution
constructor has been supplied with a RandomGenerator and changing
the no-arg distribution constructor to do nothing (so you get NPE on
sample() as you do now with the null constructor argument
workaround).  This change will have to wait to 4.0 though.
>
> The problem is just that if we add this new method, there are basically two
> ways to do sampling in CM, and a user must know exactly that if he/she uses
> the second way (by providing a rng as parameter), the distribution instance
> should be created with a null rng to avoid unnecessary memory consumption.
> This just adds more confusion, and I am not really sure it is worth the
> effort.
>
> If we stick with the decision that sampling is part of a distribution, then
> I would rather remove the old sample() method and the internal rng, and add
> the new sample(RandomGenerator) method.
> Either now by already deprecating the existing methods, or by waiting till
> 4.0.

I could live with that too, though it would break code that uses the
constructor method.  I don't see harm in supporting both.
>
> As I understand that this is a tricky decision, I created the patch which
> improves at least the situation for the inference tests immediately with a
> clean solution.

Definitely +1 to apply your patch to fix the tests and maybe add
some doc somewhere to make the workaround clearer to people using
the distribution classes without sampling.

Phil
>
> Thomas
>
> On Wed, Oct 8, 2014 at 2:11 PM, Luc Maisonobe <l...@spaceroots.org> wrote:
>
>> Hi Phil,
>>
>> Le 08/10/2014 02:38, Phil Steitz a écrit :
>>> Comments in MATH-1154 (bad performance in test code) and the
>>> now-reopened MATH-1124 (slow initialization) point to the need to
>>> fix the problem we created when we moved sample() to the
>>> distribution classes with PRNG provided by a final field with
>>> default implementation initialized by the constructor.  Several
>>> suggestions have been made to improve things.
>>>
>>> 0) require that sample() take a RandomGenerator as an additional
>>> parameter
>>> 1) improve initialization performance of the default RandomGenerator
>>> 2) Replace the default with a generator with negligible
>>> initialization latency
>>> 3) Make either PRNG initialization or initialization of the
>>> RandomGenerator field lazy
>>>
>>> I may have missed some.  One thing to note is that the (default)
>>> Well generators are *not* threadsafe, so having things final,
>>> avoiding lazy init is not really buying us anything in the current
>>> setup.  So unless 2) is done with a threadsafe generator, only 0)
>>> really makes sample() threadsafe (assuming the caller protects the
>>> instance that needs to be re-supplied on each call).
>>>
>>> I am +0 for adding a new method that takes a generator as an
>>> additional actual parameter, but as a user I like the convenience of
>>> just calling sample().  I never use distributions as singletons, so
>>> the lack of threadsafety does not concern me.  It seems then that a
>>> reasonable approach might be to add the new method, keep the old one
>>> but move to lazy initialization with documentation that when you use
>>> the default, the provided PRNG it is not threadsafe.  I would also
>>> obviously be in favor or anything we can do in 1).
>> Having a default generator is not really something I am comfortable
>> with. I consider the generator is really of the responsibility of the
>> caller and not a low level library. In my use cases, I even create only
>> one generator for a run and reuse it in all the classes that need a
>> generator rather than building different ones: the reason for that is
>> that it greatly simplifies initialization (only one seed) and that as
>> some generators may have some latency to reach a steady state[1], I
>> prefer to have one generator with more drawings extracted from it than
>> several independent generators.
>>
>> So my preferences would be towards solution 0 for this specific case in
>> the list you provide. A side effect is that I don't think the original
>> patch to MATH-1154 should be applied. As is, this patch hides even more
>> the generator, and while preserving the final field in the top class, it
>> does so by moving it in another class. If the fact this field is final
>> is a problem, then we should lift this restriction and do it in the
>> open, rather than hiding our tracks from class to class.
>>
>> Another choice would be to simply document that if the user explicitly
>> sets up a null generator, then sampling will fail. This is essentially
>> what Thomas patch does if I understand it. I would even go further:
>> either deprecate the constructors without random generator (and simply
>> explain in the remaining constructors that this generator can be set to
>> null if no sampling is desired), or lets this constructors but have them
>> set the generator to null (and document it). I *hate* having these new
>> Well19937c() calls hidden without user knowledge. Creating random
>> generators without seed is really something we should avoid.
>>
>> Apart from that, improving performances of the WELL generators is
>> certainly a good thing.
>>
>> Still another point would be to make random generators thread safe. It
>> would be important in my use cases since I *do* share generators and it
>> is intentional.
>>
>> best regards,
>> Luc
>>
>> [1] the first few millions drawings are not as good as the later ones,
>> see the paper on the WELL generator for examples
>>
>>> Phil
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>>> For additional commands, e-mail: dev-h...@commons.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>> For additional commands, e-mail: dev-h...@commons.apache.org
>>
>>



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [math] Thread-safety and initialization overhead in distribution sample() methods

Reply via email to