Re: [rng] stress test results

Alex Herbert Thu, 16 May 2019 07:38:20 -0700


> On 16 May 2019, at 15:33, Gilles Sadowski <gillese...@gmail.com> wrote:
> 
> Hi.
> 
> Le jeu. 16 mai 2019 à 16:04, Alex Herbert <alex.d.herb...@gmail.com 
> <mailto:alex.d.herb...@gmail.com>> a écrit :
>> 
>> 
>> 
>>> On 16 May 2019, at 14:42, Gilles Sadowski <gillese...@gmail.com 
>>> <mailto:gillese...@gmail.com>> wrote:
>>> 
>>> Hello.
>>> 
>>> Le jeu. 16 mai 2019 à 12:06, Alex Herbert <alex.d.herb...@gmail.com 
>>> <mailto:alex.d.herb...@gmail.com> <mailto:alex.d.herb...@gmail.com 
>>> <mailto:alex.d.herb...@gmail.com>>> a écrit :
>>>> 
>>>> I have run the stress test using the new application. The new application 
>>>> has two major changes over the previous application:
>>>> 
>>>> 1. It detects the platform byte-order and sends the bits in the correct 
>>>> order to be read by a C application
>>>> 2. The bridge to TestU01 has been updated to use all the input int values, 
>>>> previously it was using every other int value
>>>> 
>>>> So we can expect differences from both test suites Dieharder and TestU01 
>>>> BigCrush.
>>>> 
>>>> For reference here are the old results (from the user guide, reordered to 
>>>> the RandomSource enum order):
>>>> 
>>>> RNG                     Dieharder       TestU01 (BigCrush)
>>>> JDK                     11, 12, 13      74, 72, 75
>>>> WELL_512_A              0, 0, 0         7, 6, 6
>>>> WELL_1024_A             0, 0, 0         4, 4, 5
>>>> WELL_19937_A            0, 0, 0         3, 2, 3
>>>> WELL_19937_C            0, 1, 0         2, 2, 3
>>>> WELL_44497_A            0, 0, 0         2, 3, 3
>>>> WELL_44497_B            0, 0, 0         2, 2, 2
>>>> MT                      0, 1, 0         3, 2, 2
>>>> ISAAC                   0, 0, 1         0, 1, 0
>>>> SPLIT_MIX_64            0, 0, 0         2, 0, 0
>>>> XOR_SHIFT_1024_S        0, 0, 0         2, 0, 0
>>>> TWO_CMRES               1, 1, 1         0, 0, 1
>>>> MT_64                   0, 0, 1         3, 2, 3
>>>> MWC_256                 0, 0, 0         0, 0, 0
>>>> KISS                    0, 0, 0         1, 2, 0
>>>> 
>>>> Here are the new results:
>>>> 
>>>> RNG                     Dieharder       TestU01 (BigCrush)
>>>> JDK                     4,4,4,4,4       74,72,74,73,74
>>>> WELL_512_A              0,0,0,0,0       7,6,6,6,6
>>>> WELL_1024_A             0,0,0,0,0       4,4,5,4,4
>>>> WELL_19937_A            0,1,0,0,1       3,3,2,2,2
>>>> WELL_19937_C            0,0,0,0,0       2,2,3,2,2
>>>> WELL_44497_A            0,0,0,0,0       2,2,2,2,3
>>>> WELL_44497_B            0,0,0,0,0       2,3,2,2,2
>>>> MT                      0,0,0,0,0       2,3,2,2,2
>>>> ISAAC                   0,0,0,0,0       0,1,2,0,0
>>>> SPLIT_MIX_64            0,0,0,0,0       1,0,0,0,0
>>>> XOR_SHIFT_1024_S        0,0,0,0,0       0,0,0,0,0
>>>> TWO_CMRES               2,2,2,2,2       4,3,3,5,4
>>>> MT_64                   0,0,0,0,0       2,3,2,2,2
>>>> MWC_256                 0,1,0,0,0       0,0,0,2,0
>>>> KISS                    0,0,0,0,0       0,0,0,0,0
>>>> XOR_SHIFT_1024_S_PHI    0,0,0,0,0       0,0,0,0,0
>>>> XO_RO_SHI_RO_64_S       0,0,0,0,0       1,1,2,1,3
>>>> XO_RO_SHI_RO_64_SS      0,0,0,0,0       0,0,0,0,0
>>>> XO_SHI_RO_128_PLUS      0,0,1,0,0       1,2,2,1,1
>>>> XO_SHI_RO_128_SS        0,0,0,1,0       0,1,0,0,0
>>>> XO_RO_SHI_RO_128_PLUS   0,0,0,0,0       0,1,0,0,0
>>>> XO_RO_SHI_RO_128_SS     0,0,0,0,0       1,0,1,0,0
>>>> XO_SHI_RO_256_PLUS      0,1,0,0,0       0,0,0,0,0
>>>> XO_SHI_RO_256_SS        0,0,0,0,0       0,1,0,2,1
>>>> XO_SHI_RO_512_PLUS      0,0,0,0,1       0,0,0,2,2
>>>> XO_SHI_RO_512_SS        0,0,0,0,0       0,1,0,1,0
>>>> 
>>>> (Note: All of the single fails except one under Dieharder are for the 
>>>> flawed diehard_sums test. I include it here for direct comparison with old 
>>>> results. I would recommend we strip this from the new results for the user 
>>>> guide.)
>>>> 
>>>> I ran them 3 times. Then because the results were different (mainly for 
>>>> the JDK generator for Dieharder) I doubled checked everything and ran 
>>>> another 2. Results are still the same. Dieharder is much better for the 
>>>> JDK than previously. It systematically fails:
>>>> 
>>>> diehard_opso:0
>>>> diehard_oqso:0
>>>> diehard_dna:0
>>>> dab_bytedistrib:0
>>>> 
>>>> The TWO_CMRES generator is now worse as it is systematically failing:
>>>> 
>>>> diehard_oqso:0
>>>> diehard_dna:0
>>>> 
>>>> The results from BigCrush are similar for JDK and all the others except 
>>>> TWO_CMRES. This is now failing a few more tests. It systematically fails:
>>>> 
>>>> 1  SerialOver, r = 0
>>>> 41  Permutation, t = 5
>>>> 42  Permutation, t = 7
>>>> 
>>>> To check the JDK results for Dieharder I ran it 5 times using the wrong 
>>>> platform byte order (i.e. what the previous test application was doing).
>>>> 
>>>> Old results : 11, 12, 13
>>>> New results: 11,16,14,14,15
>>>> 
>>>> So this matches up. If the JDK output is byte reversed it is a poor 
>>>> generator.
>>>> 
>>>> A few sources I have read indicate that BigCrush favours the upper bits of 
>>>> a generator. A test should therefore run the generator bit reversed 
>>>> through the test application. Here are the full forward and backward 
>>>> results ignoring the Diehard sums test:
>>>> 
>>>> RNG                     Bit-reversed    Dieharder       TestU01 (BigCrush)
>>>> JDK                     false           4,4,4,4,4       74,72,74,73,74
>>>> JDK                     true            42,42,43,49,49  35,34,35,36,36
>>>> WELL_512_A              false           0,0,0,0,0       7,6,6,6,6
>>>> WELL_512_A              true            0,0,1,0,0       7,6,6,7,6
>>>> WELL_1024_A             false           0,0,0,0,0       4,4,5,4,4
>>>> WELL_1024_A             true            0,0,0,0,0       4,4,4,4,4
>>>> WELL_19937_A            false           0,1,0,0,0       3,3,2,2,2
>>>> WELL_19937_A            true            0,0,0,0,0       3,2,2,2,3
>>>> WELL_19937_C            false           0,0,0,0,0       2,2,3,2,2
>>>> WELL_19937_C            true            0,0,0,0,0       3,2,2,3,2
>>>> WELL_44497_A            false           0,0,0,0,0       2,2,2,2,3
>>>> WELL_44497_A            true            0,0,0,0,0       3,3,3,2,2
>>>> WELL_44497_B            false           0,0,0,0,0       2,3,2,2,2
>>>> WELL_44497_B            true            0,0,0,0,0       2,2,2,2,3
>>>> MT                      false           0,0,0,0,0       2,3,2,2,2
>>>> MT                      true            0,0,0,0,0       2,2,3,3,3
>>>> ISAAC                   false           0,0,0,0,0       0,1,2,0,0
>>>> ISAAC                   true            0,0,0,0,0       0,0,0,0,0
>>>> SPLIT_MIX_64            false           0,0,0,0,0       1,0,0,0,0
>>>> SPLIT_MIX_64            true            0,0,0,0,0       0,1,0,0,0
>>>> XOR_SHIFT_1024_S        false           0,0,0,0,0       0,0,0,0,0
>>>> XOR_SHIFT_1024_S        true            0,0,0,0,0       0,0,1,0,0
>>>> TWO_CMRES               false           2,2,2,2,2       4,3,3,5,4
>>>> TWO_CMRES               true            7,5,5,7,6       4,3,4,4,4
>>>> MT_64                   false           0,0,0,0,0       2,3,2,2,2
>>>> MT_64                   true            0,0,0,0,0       2,2,2,2,2
>>>> MWC_256                 false           0,0,0,0,0       0,0,0,2,0
>>>> MWC_256                 true            0,0,0,0,0       1,0,0,0,0
>>>> KISS                    false           0,0,0,0,0       0,0,0,0,0
>>>> KISS                    true            0,0,0,0,0       0,0,1,0,1
>>>> XOR_SHIFT_1024_S_PHI    false           0,0,0,0,0       0,0,0,0,0
>>>> XOR_SHIFT_1024_S_PHI    true            0,0,0,0,0       0,0,2,0,0
>>>> XO_RO_SHI_RO_64_S       false           0,0,0,0,0       1,1,2,1,3
>>>> XO_RO_SHI_RO_64_S       true            0,0,0,0,0       2,2,2,2,2
>>>> XO_RO_SHI_RO_64_SS      false           0,0,0,0,0       0,0,0,0,0
>>>> XO_RO_SHI_RO_64_SS      true            0,0,0,0,0       1,0,0,0,0
>>>> XO_SHI_RO_128_PLUS      false           0,0,0,0,0       1,2,2,1,1
>>>> XO_SHI_RO_128_PLUS      true            0,0,0,0,0       2,2,2,2,2
>>>> XO_SHI_RO_128_SS        false           0,0,0,0,0       0,1,0,0,0
>>>> XO_SHI_RO_128_SS        true            0,0,0,0,0       0,0,0,0,0
>>>> XO_RO_SHI_RO_128_PLUS   false           0,0,0,0,0       0,1,0,0,0
>>>> XO_RO_SHI_RO_128_PLUS   true            0,0,0,0,0       2,1,1,1,2
>>>> XO_RO_SHI_RO_128_SS     false           0,0,0,0,0       1,0,1,0,0
>>>> XO_RO_SHI_RO_128_SS     true            0,0,0,0,0       0,0,2,0,0
>>>> XO_SHI_RO_256_PLUS      false           0,0,0,0,0       0,0,0,0,0
>>>> XO_SHI_RO_256_PLUS      true            0,0,0,0,0       0,0,0,0,0
>>>> XO_SHI_RO_256_SS        false           0,0,0,0,0       0,1,0,2,1
>>>> XO_SHI_RO_256_SS        true            0,0,0,0,0       0,1,1,1,2
>>>> XO_SHI_RO_512_PLUS      false           0,0,0,0,0       0,0,0,2,2
>>>> XO_SHI_RO_512_PLUS      true            0,0,0,0,0       1,0,0,0,1
>>>> XO_SHI_RO_512_SS        false           0,0,0,0,0       0,1,0,1,0
>>>> XO_SHI_RO_512_SS        true            0,0,0,0,0       0,1,1,0,0
>>>> 
>>>> So bit reversed the JDK is terrible at Dieharder. It actually improves for 
>>>> BigCrush from terrible to less terrible. TWO_CMRES is a bit worse when 
>>>> bit-reversed at Dieharder but no different at BigCrush (it was already 
>>>> systematically failing 3 tests).
>>> 
>>> Is it the same version of "BigCrush"?  I'm surprised that TWO_CMRES
>>> have much more failures (bit-reversed or not).
>> 
>> I was surprised by that as well. I thought each sub-cycle generator within 
>> TWO_CMRES could almost pass BigCrush. So when combined the generator should 
>> easily pass it. Here is the version, same as all my previous usage:
>> 
>> Version: TestU01 1.2.3
>> 
>> I may investigate this further using the tests that systematically fail.
>> 
>>> 
>>>> 
>>>> All the other generators have similar results when bit reversed. So adding 
>>>> the bit-reversed results to the user-guide does not appear worthwhile. I 
>>>> will archive these and they can be added later if required, for example to 
>>>> show a good generator against a bad one. This will only be relevant if the 
>>>> library adds reference implementations of bad generators.
>>> 
>>> It's on Abhishek's TODO list (e.g. "LCG”).
>> 
>> I’ll leave it until it is needed. For now it just adds a load of extra data 
>> with little merit to the user guide.
> 
> I mean that we'll have bad generators added to the library; but I agree
> that the bit-reversed results are not useful since users od the library
> would never see the wrong values.  It was the side-effect of a bug in
> the testing code.


It is a different way to test the generator. It would be important to know the 
lower order bits are not poor for certain usages. But perhaps a better use of 
time, and space in the user guide, is to add results for PractRand instead.

> 
>> 
>>> 
>>>> Currently only the JDK is bad generator.
>>>> 
>>>> Next:
>>>> 
>>>> I have added a ‘results' command to the stress test application that can 
>>>> generate these results tables. It requires some header information not 
>>>> found in the old results files so only works with the new results. It can 
>>>> generate the APT table directly for the user guide. It will be useful 
>>>> going forward when more generators are added to update the results.
>>>> 
>>>> The new results are named using the test suite (dh_ or tu_), optionally 
>>>> the bit-reversed flag (r_), the enum ordinal and the trial run:
>>>> 
>>>> dh_1_1 = Dieharder for JDK trial 1
>>>> tu_1_1 = BigCrush for JDK trial 1
>>>> dh_r_2_3 = Dieharder bit reversed for WELL_512_A trial 3
>>>> 
>>>> I propose to:
>>>> 
>>>> - Delete all the old results and add these new ones using a new directory 
>>>> structure. All results can reside in a single directory.
>>>> - Ignore for now the bit-reversed results.
>>>> - Delete the old stress test code. The new code supersedes all 
>>>> functionality of the old version.
>>>> - Commit the new ‘results’ command when I have confirmed the APT table is 
>>>> correctly generated.
>>> 
>>> +1
>> 
>> OK.
>> 
>>> 
>>>> 
>>>> Questions:
>>>> 
>>>> 1. Do we stick to using 3 trials or update to 5 (because I have the 
>>>> results)?
>>> 
>>> +1
>> 
>> +1 to which? I assume sticking to 3 trials.
> 
> Fine with 5 trials. :-)
> 
>> 
>>> 
>>>> 2. Do we remove the diehard_sums test result?
>>>> 
>>>> I would recommend removing diehard_sums. It pollutes the results for most 
>>>> generators with a spurious fail that should be ignored. So I think we 
>>>> should ignore it.
>>> 
>>> +0 (as you wish)
>> 
>> The Dieharder web page and documentation indicates that this test should not 
>> be used.
> 
> Yes; I mentioned it on the "Commons RNG" web page.
> The result was there, just as it is output by "DieHarder" (it could be
> construed that
> DieHarder should skip the flawed test in the first place...).
> 
>> So adding it to the results is incorrect. I will document it as so. I’ll 
>> also update the ‘results’ command to ignore the test by default so you 
>> explicitly have to request it is included. This should prevent future 
>> updates to the user guide from including it by mistake.
> 
> Quite fine too.
> 
> Gilles
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org 
> <mailto:dev-unsubscr...@commons.apache.org>
> For additional commands, e-mail: dev-h...@commons.apache.org 
> <mailto:dev-h...@commons.apache.org>

Re: [rng] stress test results

Reply via email to