On 10/06/2019 17:18, Gilles Sadowski wrote:
Le lun. 10 juin 2019 à 17:56, Alex Herbert <alex.d.herb...@gmail.com> a écrit :

On 10/06/2019 16:34, Gilles Sadowski wrote:
Hello.

Le lun. 10 juin 2019 à 17:17, Alex Herbert <alex.d.herb...@gmail.com> a écrit :
On 10/06/2019 15:31, Gilles Sadowski wrote:
P.S. Thinking of releasing 1.3?
Not yet. I think there are a few outstanding items that work together
for the multi-threaded focus of the new code and the new generators:
Sure but some of them could be postponed, if just to RERO.

- RNG-98: LongJumpable (easy)

- RNG-102: SharedStateSampler (lots of easy work)

- RNG-106: XorShiRo generators require non-zero input seeds

(I'm still thinking about the best way to do this. The Jira ticket
suggests a speed test to at least know the implications of different ideas.)
This is only when using the "SeedFactory" (?).  [Otherwise, it's the
user's responsibility to construct an appropriate seed.]

Couldn't we just check that the output of the internal generator is not
all zero (and call it again if it is)?
Yes. The worse case scenario is a 1 in 2^64 collision rate with zero.
All other generators have larger state sizes. So this would be fine. An
alternative would be to set a single bit to non zero. This throws away 1
bit of randomness from the seed and will always work without any
recursion. But it makes the seed worse. The ideas are in the header for
this Jira ticket:

https://issues.apache.org/jira/browse/RNG-106

I'll fix this soon.

The other item I did not mention is outcome from RNG-104. This seems to
indicate that using System.identityHashCode(new Object()) is not as good
a mixer as a ThreadLocal random generator, both for speed and also
quality. I'm currently testing Well44497b ^ SplitMix in BigCrush but I
think this should replace the identity hash code method.
Didn't you also suggest to use XOR_SHIFT_1024_PHI (given the
large enough period, better speed score on BigCrush)?
Yes. I'm still thinking about this. My initial calculations were based
on the length of time it would take to sample all the seeds from the
generator. Below we consider the number of possible seeds required and
produced:

The Well4497b seed size is 1391 of ints so there are (2^32)^1391
possible seeds of random bits, or 2^1423.

The XorShift1024 period is (2^1024 -1) of longs so there are (2^64)^1024
random bits output before repeat (ideally as the period may be shorter).
So the bit output is max 2^1088 before repeat.

Assuming the output from XorShift1024 is truly random-per-bit it cannot
output enough unique seeds to cover all those required by the Well44497b
generator.

But currently we seed using a maximum of 128 values and leave the rest
to the self-seeding routine in the base generator. For a long array this
is (2^64)^128 = 2^192, and only 2^160 for int arrays. So the
XorShift1024 generator can output enough bits to create all the seeds
that the SeedFactory currently produces.
I'd think (roughly) that's a good enough compromise for a
"convenience" routine. [Stringent requirements can be met
explicitly otherwise.]

Sorry. I've done that wrong:

(a^b)^c = a^(b^c) not a^(b+c)

Well44497b seed size is 1391 of ints so there are 32*1391 (44512) bits in the seed, and so 2^44512 possible seeds.

XorShift1024 period is (2^1024 -1) of longs so there are 64*2^1024 random bits output before repeat, or 2^6 * 2^1024 = 2^1030.

Not enough.

The max seed size is long[128] which is 64*128 bits in the seed, and so 2^8192 possible seeds.

Still not enough.

So even though it would take 2^969 years to repeat the period of a XorShift1024 generator if sampled 10 billion time a second [1], it cannot produce every possible seed currently required.

So this is a dilemma. Choose a generator that can theoretically output all the seeds required, even though you could never use them, or choose a faster generator that can still output more seeds than you could possibly use.

[1] https://issues.apache.org/jira/browse/RNG-104?focusedCommentId=16857619&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16857619


So a switch to using XorShift1024 would satisfy current requirements
and, given it passes BigCrush, would remove the requirement to mix the
output with a second generator. (IIUC the purpose is to improve
randomness of Well44497.)
Is it really necessary to have the utmost randomness for generating
seeds?  [It seems that the (rare) correlations of a seed used by some
RNG instance with a seed used by some other instance will be "diluted"
in the different sequences generated by the two instances.]

Whatever, replacing with XorShift1024 will gain on that too, and
as you note, keep the SeedFactory simpler (no mix).

Regards,
Gilles

[...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to