On 10/8/14 5:33 AM, Thomas Neidhart wrote: > Hi, > > in my use-cases I also use only 1 random generator with a fixed seed which > is important property as in case of simulations you want to have a > deterministic behavior for all random decisions.
I tend to do the same thing. > > I am not sure if we should invest effort to make the existing random > generators thread-safe, as there is already the SynchronizedRandomGenerator > wrapper that makes any RandomGenerator instance thread-safe.e I agree here and I would add that when you think about it, threadsafety is actually hard to *define* for PRNGs and devilishly hard to implement if defined strictly. If you interpret the contact of the Well or Mersenne generators, for example, to be that they generate sequences as defined by those algorithms, a thread-safe implementation would have to satisfy the property that every subsequence behaves like a directly generated sequence. Note also that concurrent access destroys determinism via fixed seeds. So while just slowing things down by adding syncs can avoid runtime issues, what exactly it does to the random sequences and in particular whether they have the properties of directly generated sequences is not at all clear. > So a plain random generator instance is not thread-safe, but fast, while > you can make every instance thread-safe by wrapping it, This is the > flexibility that I would be looking for. > > Regarding the additional sample method: > > Personally, I would setup a distribution object with a given rng as it is > more convenient to use afterwards as Phil pointed out, but I am also fine > with a sample(RandomGenerator) method. I agree here too. I like Luc's idea (sorry if I am misconstruing it) of making the no-arg sample() *require* that the distribution constructor has been supplied with a RandomGenerator and changing the no-arg distribution constructor to do nothing (so you get NPE on sample() as you do now with the null constructor argument workaround). This change will have to wait to 4.0 though. > > The problem is just that if we add this new method, there are basically two > ways to do sampling in CM, and a user must know exactly that if he/she uses > the second way (by providing a rng as parameter), the distribution instance > should be created with a null rng to avoid unnecessary memory consumption. > This just adds more confusion, and I am not really sure it is worth the > effort. > > If we stick with the decision that sampling is part of a distribution, then > I would rather remove the old sample() method and the internal rng, and add > the new sample(RandomGenerator) method. > Either now by already deprecating the existing methods, or by waiting till > 4.0. I could live with that too, though it would break code that uses the constructor method. I don't see harm in supporting both. > > As I understand that this is a tricky decision, I created the patch which > improves at least the situation for the inference tests immediately with a > clean solution. Definitely +1 to apply your patch to fix the tests and maybe add some doc somewhere to make the workaround clearer to people using the distribution classes without sampling. Phil > > Thomas > > On Wed, Oct 8, 2014 at 2:11 PM, Luc Maisonobe <l...@spaceroots.org> wrote: > >> Hi Phil, >> >> Le 08/10/2014 02:38, Phil Steitz a écrit : >>> Comments in MATH-1154 (bad performance in test code) and the >>> now-reopened MATH-1124 (slow initialization) point to the need to >>> fix the problem we created when we moved sample() to the >>> distribution classes with PRNG provided by a final field with >>> default implementation initialized by the constructor. Several >>> suggestions have been made to improve things. >>> >>> 0) require that sample() take a RandomGenerator as an additional >>> parameter >>> 1) improve initialization performance of the default RandomGenerator >>> 2) Replace the default with a generator with negligible >>> initialization latency >>> 3) Make either PRNG initialization or initialization of the >>> RandomGenerator field lazy >>> >>> I may have missed some. One thing to note is that the (default) >>> Well generators are *not* threadsafe, so having things final, >>> avoiding lazy init is not really buying us anything in the current >>> setup. So unless 2) is done with a threadsafe generator, only 0) >>> really makes sample() threadsafe (assuming the caller protects the >>> instance that needs to be re-supplied on each call). >>> >>> I am +0 for adding a new method that takes a generator as an >>> additional actual parameter, but as a user I like the convenience of >>> just calling sample(). I never use distributions as singletons, so >>> the lack of threadsafety does not concern me. It seems then that a >>> reasonable approach might be to add the new method, keep the old one >>> but move to lazy initialization with documentation that when you use >>> the default, the provided PRNG it is not threadsafe. I would also >>> obviously be in favor or anything we can do in 1). >> Having a default generator is not really something I am comfortable >> with. I consider the generator is really of the responsibility of the >> caller and not a low level library. In my use cases, I even create only >> one generator for a run and reuse it in all the classes that need a >> generator rather than building different ones: the reason for that is >> that it greatly simplifies initialization (only one seed) and that as >> some generators may have some latency to reach a steady state[1], I >> prefer to have one generator with more drawings extracted from it than >> several independent generators. >> >> So my preferences would be towards solution 0 for this specific case in >> the list you provide. A side effect is that I don't think the original >> patch to MATH-1154 should be applied. As is, this patch hides even more >> the generator, and while preserving the final field in the top class, it >> does so by moving it in another class. If the fact this field is final >> is a problem, then we should lift this restriction and do it in the >> open, rather than hiding our tracks from class to class. >> >> Another choice would be to simply document that if the user explicitly >> sets up a null generator, then sampling will fail. This is essentially >> what Thomas patch does if I understand it. I would even go further: >> either deprecate the constructors without random generator (and simply >> explain in the remaining constructors that this generator can be set to >> null if no sampling is desired), or lets this constructors but have them >> set the generator to null (and document it). I *hate* having these new >> Well19937c() calls hidden without user knowledge. Creating random >> generators without seed is really something we should avoid. >> >> Apart from that, improving performances of the WELL generators is >> certainly a good thing. >> >> Still another point would be to make random generators thread safe. It >> would be important in my use cases since I *do* share generators and it >> is intentional. >> >> best regards, >> Luc >> >> [1] the first few millions drawings are not as good as the later ones, >> see the paper on the WELL generator for examples >> >>> Phil >>> >>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>> For additional commands, e-mail: dev-h...@commons.apache.org >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> For additional commands, e-mail: dev-h...@commons.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org