Re: [rng] stress test results

Gilles Sadowski Thu, 16 May 2019 06:54:08 -0700

Hello.

Le jeu. 16 mai 2019 à 12:06, Alex Herbert <alex.d.herb...@gmail.com> a écrit :
>
> I have run the stress test using the new application. The new application has 
> two major changes over the previous application:
>
> 1. It detects the platform byte-order and sends the bits in the correct order 
> to be read by a C application
> 2. The bridge to TestU01 has been updated to use all the input int values, 
> previously it was using every other int value
>
> So we can expect differences from both test suites Dieharder and TestU01 
> BigCrush.
>
> For reference here are the old results (from the user guide, reordered to the 
> RandomSource enum order):
>
> RNG                     Dieharder       TestU01 (BigCrush)
> JDK                     11, 12, 13      74, 72, 75
> WELL_512_A              0, 0, 0         7, 6, 6
> WELL_1024_A             0, 0, 0         4, 4, 5
> WELL_19937_A            0, 0, 0         3, 2, 3
> WELL_19937_C            0, 1, 0         2, 2, 3
> WELL_44497_A            0, 0, 0         2, 3, 3
> WELL_44497_B            0, 0, 0         2, 2, 2
> MT                      0, 1, 0         3, 2, 2
> ISAAC                   0, 0, 1         0, 1, 0
> SPLIT_MIX_64            0, 0, 0         2, 0, 0
> XOR_SHIFT_1024_S        0, 0, 0         2, 0, 0
> TWO_CMRES               1, 1, 1         0, 0, 1
> MT_64                   0, 0, 1         3, 2, 3
> MWC_256                 0, 0, 0         0, 0, 0
> KISS                    0, 0, 0         1, 2, 0
>
> Here are the new results:
>
> RNG                     Dieharder       TestU01 (BigCrush)
> JDK                     4,4,4,4,4       74,72,74,73,74
> WELL_512_A              0,0,0,0,0       7,6,6,6,6
> WELL_1024_A             0,0,0,0,0       4,4,5,4,4
> WELL_19937_A            0,1,0,0,1       3,3,2,2,2
> WELL_19937_C            0,0,0,0,0       2,2,3,2,2
> WELL_44497_A            0,0,0,0,0       2,2,2,2,3
> WELL_44497_B            0,0,0,0,0       2,3,2,2,2
> MT                      0,0,0,0,0       2,3,2,2,2
> ISAAC                   0,0,0,0,0       0,1,2,0,0
> SPLIT_MIX_64            0,0,0,0,0       1,0,0,0,0
> XOR_SHIFT_1024_S        0,0,0,0,0       0,0,0,0,0
> TWO_CMRES               2,2,2,2,2       4,3,3,5,4
> MT_64                   0,0,0,0,0       2,3,2,2,2
> MWC_256                 0,1,0,0,0       0,0,0,2,0
> KISS                    0,0,0,0,0       0,0,0,0,0
> XOR_SHIFT_1024_S_PHI    0,0,0,0,0       0,0,0,0,0
> XO_RO_SHI_RO_64_S       0,0,0,0,0       1,1,2,1,3
> XO_RO_SHI_RO_64_SS      0,0,0,0,0       0,0,0,0,0
> XO_SHI_RO_128_PLUS      0,0,1,0,0       1,2,2,1,1
> XO_SHI_RO_128_SS        0,0,0,1,0       0,1,0,0,0
> XO_RO_SHI_RO_128_PLUS   0,0,0,0,0       0,1,0,0,0
> XO_RO_SHI_RO_128_SS     0,0,0,0,0       1,0,1,0,0
> XO_SHI_RO_256_PLUS      0,1,0,0,0       0,0,0,0,0
> XO_SHI_RO_256_SS        0,0,0,0,0       0,1,0,2,1
> XO_SHI_RO_512_PLUS      0,0,0,0,1       0,0,0,2,2
> XO_SHI_RO_512_SS        0,0,0,0,0       0,1,0,1,0
>
> (Note: All of the single fails except one under Dieharder are for the flawed 
> diehard_sums test. I include it here for direct comparison with old results. 
> I would recommend we strip this from the new results for the user guide.)
>
> I ran them 3 times. Then because the results were different (mainly for the 
> JDK generator for Dieharder) I doubled checked everything and ran another 2. 
> Results are still the same. Dieharder is much better for the JDK than 
> previously. It systematically fails:
>
> diehard_opso:0
> diehard_oqso:0
> diehard_dna:0
> dab_bytedistrib:0
>
> The TWO_CMRES generator is now worse as it is systematically failing:
>
> diehard_oqso:0
> diehard_dna:0
>
> The results from BigCrush are similar for JDK and all the others except 
> TWO_CMRES. This is now failing a few more tests. It systematically fails:
>
> 1  SerialOver, r = 0
> 41  Permutation, t = 5
> 42  Permutation, t = 7
>
> To check the JDK results for Dieharder I ran it 5 times using the wrong 
> platform byte order (i.e. what the previous test application was doing).
>
> Old results : 11, 12, 13
> New results: 11,16,14,14,15
>
> So this matches up. If the JDK output is byte reversed it is a poor generator.
>
> A few sources I have read indicate that BigCrush favours the upper bits of a 
> generator. A test should therefore run the generator bit reversed through the 
> test application. Here are the full forward and backward results ignoring the 
> Diehard sums test:
>
> RNG                     Bit-reversed    Dieharder       TestU01 (BigCrush)
> JDK                     false           4,4,4,4,4       74,72,74,73,74
> JDK                     true            42,42,43,49,49  35,34,35,36,36
> WELL_512_A              false           0,0,0,0,0       7,6,6,6,6
> WELL_512_A              true            0,0,1,0,0       7,6,6,7,6
> WELL_1024_A             false           0,0,0,0,0       4,4,5,4,4
> WELL_1024_A             true            0,0,0,0,0       4,4,4,4,4
> WELL_19937_A            false           0,1,0,0,0       3,3,2,2,2
> WELL_19937_A            true            0,0,0,0,0       3,2,2,2,3
> WELL_19937_C            false           0,0,0,0,0       2,2,3,2,2
> WELL_19937_C            true            0,0,0,0,0       3,2,2,3,2
> WELL_44497_A            false           0,0,0,0,0       2,2,2,2,3
> WELL_44497_A            true            0,0,0,0,0       3,3,3,2,2
> WELL_44497_B            false           0,0,0,0,0       2,3,2,2,2
> WELL_44497_B            true            0,0,0,0,0       2,2,2,2,3
> MT                      false           0,0,0,0,0       2,3,2,2,2
> MT                      true            0,0,0,0,0       2,2,3,3,3
> ISAAC                   false           0,0,0,0,0       0,1,2,0,0
> ISAAC                   true            0,0,0,0,0       0,0,0,0,0
> SPLIT_MIX_64            false           0,0,0,0,0       1,0,0,0,0
> SPLIT_MIX_64            true            0,0,0,0,0       0,1,0,0,0
> XOR_SHIFT_1024_S        false           0,0,0,0,0       0,0,0,0,0
> XOR_SHIFT_1024_S        true            0,0,0,0,0       0,0,1,0,0
> TWO_CMRES               false           2,2,2,2,2       4,3,3,5,4
> TWO_CMRES               true            7,5,5,7,6       4,3,4,4,4
> MT_64                   false           0,0,0,0,0       2,3,2,2,2
> MT_64                   true            0,0,0,0,0       2,2,2,2,2
> MWC_256                 false           0,0,0,0,0       0,0,0,2,0
> MWC_256                 true            0,0,0,0,0       1,0,0,0,0
> KISS                    false           0,0,0,0,0       0,0,0,0,0
> KISS                    true            0,0,0,0,0       0,0,1,0,1
> XOR_SHIFT_1024_S_PHI    false           0,0,0,0,0       0,0,0,0,0
> XOR_SHIFT_1024_S_PHI    true            0,0,0,0,0       0,0,2,0,0
> XO_RO_SHI_RO_64_S       false           0,0,0,0,0       1,1,2,1,3
> XO_RO_SHI_RO_64_S       true            0,0,0,0,0       2,2,2,2,2
> XO_RO_SHI_RO_64_SS      false           0,0,0,0,0       0,0,0,0,0
> XO_RO_SHI_RO_64_SS      true            0,0,0,0,0       1,0,0,0,0
> XO_SHI_RO_128_PLUS      false           0,0,0,0,0       1,2,2,1,1
> XO_SHI_RO_128_PLUS      true            0,0,0,0,0       2,2,2,2,2
> XO_SHI_RO_128_SS        false           0,0,0,0,0       0,1,0,0,0
> XO_SHI_RO_128_SS        true            0,0,0,0,0       0,0,0,0,0
> XO_RO_SHI_RO_128_PLUS   false           0,0,0,0,0       0,1,0,0,0
> XO_RO_SHI_RO_128_PLUS   true            0,0,0,0,0       2,1,1,1,2
> XO_RO_SHI_RO_128_SS     false           0,0,0,0,0       1,0,1,0,0
> XO_RO_SHI_RO_128_SS     true            0,0,0,0,0       0,0,2,0,0
> XO_SHI_RO_256_PLUS      false           0,0,0,0,0       0,0,0,0,0
> XO_SHI_RO_256_PLUS      true            0,0,0,0,0       0,0,0,0,0
> XO_SHI_RO_256_SS        false           0,0,0,0,0       0,1,0,2,1
> XO_SHI_RO_256_SS        true            0,0,0,0,0       0,1,1,1,2
> XO_SHI_RO_512_PLUS      false           0,0,0,0,0       0,0,0,2,2
> XO_SHI_RO_512_PLUS      true            0,0,0,0,0       1,0,0,0,1
> XO_SHI_RO_512_SS        false           0,0,0,0,0       0,1,0,1,0
> XO_SHI_RO_512_SS        true            0,0,0,0,0       0,1,1,0,0
>
> So bit reversed the JDK is terrible at Dieharder. It actually improves for 
> BigCrush from terrible to less terrible. TWO_CMRES is a bit worse when 
> bit-reversed at Dieharder but no different at BigCrush (it was already 
> systematically failing 3 tests).


Is it the same version of "BigCrush"?  I'm surprised that TWO_CMRES
have much more failures (bit-reversed or not).

>
> All the other generators have similar results when bit reversed. So adding 
> the bit-reversed results to the user-guide does not appear worthwhile. I will 
> archive these and they can be added later if required, for example to show a 
> good generator against a bad one. This will only be relevant if the library 
> adds reference implementations of bad generators.

It's on Abhishek's TODO list (e.g. "LCG").

> Currently only the JDK is bad generator.
>
> Next:
>
> I have added a ‘results' command to the stress test application that can 
> generate these results tables. It requires some header information not found 
> in the old results files so only works with the new results. It can generate 
> the APT table directly for the user guide. It will be useful going forward 
> when more generators are added to update the results.
>
> The new results are named using the test suite (dh_ or tu_), optionally the 
> bit-reversed flag (r_), the enum ordinal and the trial run:
>
> dh_1_1 = Dieharder for JDK trial 1
> tu_1_1 = BigCrush for JDK trial 1
> dh_r_2_3 = Dieharder bit reversed for WELL_512_A trial 3
>
> I propose to:
>
> - Delete all the old results and add these new ones using a new directory 
> structure. All results can reside in a single directory.
> - Ignore for now the bit-reversed results.
> - Delete the old stress test code. The new code supersedes all functionality 
> of the old version.
> - Commit the new ‘results’ command when I have confirmed the APT table is 
> correctly generated.

+1

>
> Questions:
>
> 1. Do we stick to using 3 trials or update to 5 (because I have the results)?

+1

> 2. Do we remove the diehard_sums test result?
>
> I would recommend removing diehard_sums. It pollutes the results for most 
> generators with a spurious fail that should be ignored. So I think we should 
> ignore it.

+0 (as you wish)

Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [rng] stress test results

Reply via email to