On 8/3/11 10:05 AM, Luc Maisonobe wrote: > Le 03/08/2011 18:15, Phil Steitz a écrit : >> On 8/3/11 9:02 AM, sebb wrote: >>> On 3 August 2011 09:06, Luc Maisonobe<luc.maison...@free.fr> >>> wrote: >>>> Le 03/08/2011 09:38, Luc Maisonobe a écrit : >>>>> Le 01/08/2011 22:40, Luc Maisonobe a écrit : >>>>>> Hi Phil, >>>>>> >>>>>> Le 01/08/2011 20:39, Phil Steitz a écrit : >>>>>>> On 8/1/11 1:31 AM, luc.maison...@free.fr wrote: >>>>>>>> Hi Phil, >>>>>>>> >>>>>>>> ----- Mail original ----- >>>>>>>>> In my own applications, I noticed what appears to be poor >>>>>>>>> performance in the nextInt(int) method of the Mersenne >>>>>>>>> twister, >>>>>>>>> which I was using to *improve* speed. I think that for >>>>>>>>> small n, the >>>>>>>>> default implementation in BistreamGenerator may be running >>>>>>>>> too many >>>>>>>>> iterations. >>>>>>>> Mersenne twister uses a quite large pool. It creates >>>>>>>> pseudo-random bits >>>>>>>> by twisting it and creates large bunches at a time (624 >>>>>>>> words at a >>>>>>>> time). >>>>>>>> Hence when you ask for large sets, you should have several >>>>>>>> calls that >>>>>>>> return fast, and one call that takes a longer time to >>>>>>>> generate another >>>>>>>> large pool. >>>>>>>> >>>>>>>> So good performances are obtained for generating large >>>>>>>> sets, not >>>>>>>> small sets. >>>>>>>> >>>>>>>> Well generators should be faster and are preferred over >>>>>>>> Mersenne >>>>>>>> twister now, >>>>>>>> which is now an old generator. Well generators also have >>>>>>>> large pools, >>>>>>>> but they >>>>>>>> don't generate bits in large batches in advance, they do >>>>>>>> generate a >>>>>>>> few words >>>>>>>> at a time. >>>>>>> Yeah, I know. Both are faster than the JDK, though, even for >>>>>>> just >>>>>>> 32-bit chunks in my tests at least. >>>>>>> >>>>>>> One thing I have been thinking about is exposing nextInt[], >>>>>>> nextDouble[], nextGaussian[] etc methods that take advantage >>>>>>> of the >>>>>>> pools. So you basically generate a large block of bits use >>>>>>> this to >>>>>>> fill the output arrays. >>>>>> Seems a very good idea. Most of the time, people generate >>>>>> only one kind >>>>>> of numbers several times, so it really does make sense. >>>>>> >>>>>>>>> I am still figuring out how the code works, but I >>>>>>>>> thought it would be good to run some benchmarks - using >>>>>>>>> Gilles' new >>>>>>>>> stuff - against the Harmony implementation in >>>>>>>>> java.util.Random of >>>>>>>>> this method. That led me to notice that there are no unit >>>>>>>>> tests for >>>>>>>>> BitstreamGenerator. I propose that we add >>>>>>>>> 0) RandomGeneratorAbstractTest with an abstract makeGenerator >>>>>>>>> method including fixed seed tests for all RandomGenerator >>>>>>>>> methods >>>>>>>>> 1) BitstreamGeneratorTest extending >>>>>>>>> RandomGeneratorAbstractTest >>>>>>>>> implementing makeGenerator with a BitStreamGenerator that >>>>>>>>> uses the >>>>>>>>> JDK generator for next(int) >>>>>>>>> 2) Make the test classes for Mersenne and Weil generators >>>>>>>>> extend >>>>>>>>> RandomGeneratorAbstractTest, moving redundant tests up >>>>>>>>> into the base >>>>>>>>> class >>>>>>>>> >>>>>>>>> Sound reasonable? >>>>>>>> +1 >>>>>>>> >>>>>>>>> Also, any recollection why we are using a >>>>>>>>> different implementation in BitStreamGenerator for >>>>>>>>> next(int) than >>>>>>>>> Harmony and the JDK use? >>>>>>>> I don't understand what you mean. next(int) is used to >>>>>>>> generate the raw >>>>>>>> bits and is the heart of each generator. Each generator has >>>>>>>> its own >>>>>>>> implementation. Replacing next(int) by the JDK generation >>>>>>>> would imply >>>>>>>> dropping completely Mersenne twister and Well generators. >>>>>>> I am sorry. I meant nextInt(int). It is that code that seems >>>>>>> to be >>>>>>> slow in BitStreamGenerator and different from the JDK and >>>>>>> Harmony. >>>>>> Could you point me at some code ? There are many pitfalls in >>>>>> netInt(int) >>>>>> if one wants to make sure the generator is uniform, which >>>>>> explain the >>>>>> strange implementation, with the mask computation and the >>>>>> loop. By the >>>>>> way, even this implementation would benefit from your >>>>>> proposed array >>>>>> generation, as the mask could be computed only once. >>>>> I have looked at the implementation for JDK and Harmony and am >>>>> a little >>>>> puzzled. >>>>> >>>>> The trick for the power of two (i.e. if ((n& -n) == n)) is >>>>> not useful >>>>> for the very elaborate generators like Mersenne twister or >>>>> Well. Both >>>>> are proven to be equidistributed even for the low order bits. >>>>> They are >>>>> based on linear recurrences but not linear congruences and do >>>>> not suffer >>>>> from the drawbacks of the latter. >>>>> >>>>> What puzzles me more is the loop. It is documented as avoiding >>>>> the >>>>> uneven distributions, but at first glance the modulo operation >>>>> bothers >>>>> me. As documentation explicitly states it is designed for >>>>> this, it is >>>>> most probably true, I simply don't understand how yet. >>>>> >>>>> So our current implementation is slow, then go ahead and >>>>> change it to >>>>> the one you showed me. I would simply suggest to get rid of >>>>> the ((n& >>>>> -n) == n) test. I'll try to understand the condition in the >>>>> while loop >>>>> to understand how it rejects uneven distributions, just out of >>>>> curiosity >>>>> for myself. >>>> OK, I finally understood the algorithm and how it rejects the >>>> largest >>>> incomplete numbers from k*n to (2^31)-1 where k*n is the >>>> largest multiple of >>>> n that fits in a positive integer. The trick lies in the >>>> addition of (n-1) >>>> which overflows the integer and wraps the result back to >>>> negative values. It >>>> is smart. >>>> >>>> +1 to use it. >>> Provided that the algorithm is documented ... >> >> Yeah, I was going to try to decipher it (and the current impl) and >> provide some doc. One other thing to consider in this decision is >> do we have to worry about encumberance. The Harmony impl looks very >> similar to what is described in the JDK javadoc. I wonder if >> SunOracle might have claim to it. >> >> Where did you get the current impl, Luc? > > Unfortunately, I can't remember and I did not document it, my bad. > For sure, I did not invent it. > > Luc > >> >> Now that we have test infrastructure and Gilles' new tool, I will >> run some benchmarks to see what, if anything, the Harmony/JDK impl >> buys us.
I just ran the benchmarks and opened MATH-642 for this. Looks like it is worth it to change. Now I just need to decipher the Harmony algorithm and get it documented. Phil >> >> Phil >>> >>>> Luc >>>> >>>>> Luc >>>>> >>>>>> Luc >>>>>> >>>>>> >>>>>>> Phil >>>>>>>> Mersenne twister and Well should be fast for generating >>>>>>>> large sets, but >>>>>>>> most importantly they have very good and *proven* properties >>>>>>>> (equidistribution >>>>>>>> on large dimensions, null correlation, maximal period ...). >>>>>>>> These >>>>>>>> properties >>>>>>>> are essential for example in Monte-Carlo simulations with >>>>>>>> lots of >>>>>>>> variables that >>>>>>>> must be independent or have controlled correlations. >>>>>>>> >>>>>>>> Luc >>>>>>>> >>>>>>>>> The Harmony impl is almost identical to >>>>>>>>> what is documented in the JDK javadoc. >>>>>>>>> >>>>>>>>> Phil >>>>>>>>> >>>>>>>>> --------------------------------------------------------------------- >>>>>>>>> >>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>>>>>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>>>>>>> >>>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>>>>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>>>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> >>>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>>>> >>>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> >>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>>> >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> >>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>> >>>> >>> --------------------------------------------------------------------- >>> >>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>> For additional commands, e-mail: dev-h...@commons.apache.org >>> >>> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> For additional commands, e-mail: dev-h...@commons.apache.org >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org