Hi. Le lun. 10 juin 2019 à 18:33, Alex Herbert <alex.d.herb...@gmail.com> a écrit : > > > On 10/06/2019 17:18, Gilles Sadowski wrote: > > Le lun. 10 juin 2019 à 17:56, Alex Herbert <alex.d.herb...@gmail.com> a > > écrit : > >> > >> On 10/06/2019 16:34, Gilles Sadowski wrote: > >>> Hello. > >>> > >>> Le lun. 10 juin 2019 à 17:17, Alex Herbert <alex.d.herb...@gmail.com> a > >>> écrit : > >>>> On 10/06/2019 15:31, Gilles Sadowski wrote: > >>>>>>> P.S. Thinking of releasing 1.3? > >>>>>> Not yet. I think there are a few outstanding items that work together > >>>>>> for the multi-threaded focus of the new code and the new generators: > >>>>> Sure but some of them could be postponed, if just to RERO. > >>>>> > >>>>>> - RNG-98: LongJumpable (easy) > >>>>>> > >>>>>> - RNG-102: SharedStateSampler (lots of easy work) > >>>>>> > >>>>>> - RNG-106: XorShiRo generators require non-zero input seeds > >>>>>> > >>>>>> (I'm still thinking about the best way to do this. The Jira ticket > >>>>>> suggests a speed test to at least know the implications of different > >>>>>> ideas.) > >>>>> This is only when using the "SeedFactory" (?). [Otherwise, it's the > >>>>> user's responsibility to construct an appropriate seed.] > >>>>> > >>>>> Couldn't we just check that the output of the internal generator is not > >>>>> all zero (and call it again if it is)? > >>>> Yes. The worse case scenario is a 1 in 2^64 collision rate with zero. > >>>> All other generators have larger state sizes. So this would be fine. An > >>>> alternative would be to set a single bit to non zero. This throws away 1 > >>>> bit of randomness from the seed and will always work without any > >>>> recursion. But it makes the seed worse. The ideas are in the header for > >>>> this Jira ticket: > >>>> > >>>> https://issues.apache.org/jira/browse/RNG-106 > >>>> > >>>> I'll fix this soon. > >>>> > >>>> The other item I did not mention is outcome from RNG-104. This seems to > >>>> indicate that using System.identityHashCode(new Object()) is not as good > >>>> a mixer as a ThreadLocal random generator, both for speed and also > >>>> quality. I'm currently testing Well44497b ^ SplitMix in BigCrush but I > >>>> think this should replace the identity hash code method. > >>> Didn't you also suggest to use XOR_SHIFT_1024_PHI (given the > >>> large enough period, better speed score on BigCrush)? > >> Yes. I'm still thinking about this. My initial calculations were based > >> on the length of time it would take to sample all the seeds from the > >> generator. Below we consider the number of possible seeds required and > >> produced: > >> > >> The Well4497b seed size is 1391 of ints so there are (2^32)^1391 > >> possible seeds of random bits, or 2^1423. > >> > >> The XorShift1024 period is (2^1024 -1) of longs so there are (2^64)^1024 > >> random bits output before repeat (ideally as the period may be shorter). > >> So the bit output is max 2^1088 before repeat. > >> > >> Assuming the output from XorShift1024 is truly random-per-bit it cannot > >> output enough unique seeds to cover all those required by the Well44497b > >> generator. > >> > >> But currently we seed using a maximum of 128 values and leave the rest > >> to the self-seeding routine in the base generator. For a long array this > >> is (2^64)^128 = 2^192, and only 2^160 for int arrays. So the > >> XorShift1024 generator can output enough bits to create all the seeds > >> that the SeedFactory currently produces. > > I'd think (roughly) that's a good enough compromise for a > > "convenience" routine. [Stringent requirements can be met > > explicitly otherwise.] > > Sorry. I've done that wrong: > > (a^b)^c = a^(b^c) not a^(b+c) > > Well44497b seed size is 1391 of ints so there are 32*1391 (44512) bits > in the seed, and so 2^44512 possible seeds. > > XorShift1024 period is (2^1024 -1) of longs so there are 64*2^1024 > random bits output before repeat, or 2^6 * 2^1024 = 2^1030. > > Not enough. > > The max seed size is long[128] which is 64*128 bits in the seed, and so > 2^8192 possible seeds. > > Still not enough. > > So even though it would take 2^969 years to repeat the period of a > XorShift1024 generator if sampled 10 billion time a second [1], it > cannot produce every possible seed currently required. > > So this is a dilemma. Choose a generator that can theoretically output > all the seeds required, even though you could never use them, or choose > a faster generator that can still output more seeds than you could > possibly use.
How about using both? Generate the first element of the seed array (or the single "long" seed) from Well44497b and the rest from XorShift1024? Gilles > > [1] > https://issues.apache.org/jira/browse/RNG-104?focusedCommentId=16857619&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16857619 > > > > >> So a switch to using XorShift1024 would satisfy current requirements > >> and, given it passes BigCrush, would remove the requirement to mix the > >> output with a second generator. (IIUC the purpose is to improve > >> randomness of Well44497.) > > Is it really necessary to have the utmost randomness for generating > > seeds? [It seems that the (rare) correlations of a seed used by some > > RNG instance with a seed used by some other instance will be "diluted" > > in the different sequences generated by the two instances.] > > > > Whatever, replacing with XorShift1024 will gain on that too, and > > as you note, keep the SeedFactory simpler (no mix). > > > > Regards, > > Gilles > > > >>> [...] --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org