[rng] stress test results

Alex Herbert Thu, 16 May 2019 03:07:07 -0700

I have run the stress test using the new application. The new application has 
two major changes over the previous application:


1. It detects the platform byte-order and sends the bits in the correct order 
to be read by a C application
2. The bridge to TestU01 has been updated to use all the input int values, 
previously it was using every other int value

So we can expect differences from both test suites Dieharder and TestU01 
BigCrush.

For reference here are the old results (from the user guide, reordered to the 
RandomSource enum order):

RNG                     Dieharder       TestU01 (BigCrush)
JDK                     11, 12, 13      74, 72, 75
WELL_512_A              0, 0, 0         7, 6, 6
WELL_1024_A             0, 0, 0         4, 4, 5
WELL_19937_A            0, 0, 0         3, 2, 3
WELL_19937_C            0, 1, 0         2, 2, 3
WELL_44497_A            0, 0, 0         2, 3, 3
WELL_44497_B            0, 0, 0         2, 2, 2
MT                      0, 1, 0         3, 2, 2
ISAAC                   0, 0, 1         0, 1, 0
SPLIT_MIX_64            0, 0, 0         2, 0, 0
XOR_SHIFT_1024_S        0, 0, 0         2, 0, 0
TWO_CMRES               1, 1, 1         0, 0, 1
MT_64                   0, 0, 1         3, 2, 3
MWC_256                 0, 0, 0         0, 0, 0
KISS                    0, 0, 0         1, 2, 0

Here are the new results:

RNG                     Dieharder       TestU01 (BigCrush)
JDK                     4,4,4,4,4       74,72,74,73,74    
WELL_512_A              0,0,0,0,0       7,6,6,6,6         
WELL_1024_A             0,0,0,0,0       4,4,5,4,4         
WELL_19937_A            0,1,0,0,1       3,3,2,2,2         
WELL_19937_C            0,0,0,0,0       2,2,3,2,2         
WELL_44497_A            0,0,0,0,0       2,2,2,2,3         
WELL_44497_B            0,0,0,0,0       2,3,2,2,2         
MT                      0,0,0,0,0       2,3,2,2,2         
ISAAC                   0,0,0,0,0       0,1,2,0,0         
SPLIT_MIX_64            0,0,0,0,0       1,0,0,0,0         
XOR_SHIFT_1024_S        0,0,0,0,0       0,0,0,0,0         
TWO_CMRES               2,2,2,2,2       4,3,3,5,4         
MT_64                   0,0,0,0,0       2,3,2,2,2         
MWC_256                 0,1,0,0,0       0,0,0,2,0         
KISS                    0,0,0,0,0       0,0,0,0,0         
XOR_SHIFT_1024_S_PHI    0,0,0,0,0       0,0,0,0,0         
XO_RO_SHI_RO_64_S       0,0,0,0,0       1,1,2,1,3         
XO_RO_SHI_RO_64_SS      0,0,0,0,0       0,0,0,0,0         
XO_SHI_RO_128_PLUS      0,0,1,0,0       1,2,2,1,1         
XO_SHI_RO_128_SS        0,0,0,1,0       0,1,0,0,0         
XO_RO_SHI_RO_128_PLUS   0,0,0,0,0       0,1,0,0,0         
XO_RO_SHI_RO_128_SS     0,0,0,0,0       1,0,1,0,0         
XO_SHI_RO_256_PLUS      0,1,0,0,0       0,0,0,0,0         
XO_SHI_RO_256_SS        0,0,0,0,0       0,1,0,2,1         
XO_SHI_RO_512_PLUS      0,0,0,0,1       0,0,0,2,2         
XO_SHI_RO_512_SS        0,0,0,0,0       0,1,0,1,0

(Note: All of the single fails except one under Dieharder are for the flawed 
diehard_sums test. I include it here for direct comparison with old results. I 
would recommend we strip this from the new results for the user guide.)

I ran them 3 times. Then because the results were different (mainly for the JDK 
generator for Dieharder) I doubled checked everything and ran another 2. 
Results are still the same. Dieharder is much better for the JDK than 
previously. It systematically fails:

diehard_opso:0
diehard_oqso:0
diehard_dna:0
dab_bytedistrib:0

The TWO_CMRES generator is now worse as it is systematically failing:

diehard_oqso:0
diehard_dna:0

The results from BigCrush are similar for JDK and all the others except 
TWO_CMRES. This is now failing a few more tests. It systematically fails:

1  SerialOver, r = 0
41  Permutation, t = 5
42  Permutation, t = 7

To check the JDK results for Dieharder I ran it 5 times using the wrong 
platform byte order (i.e. what the previous test application was doing).

Old results : 11, 12, 13
New results: 11,16,14,14,15

So this matches up. If the JDK output is byte reversed it is a poor generator.

A few sources I have read indicate that BigCrush favours the upper bits of a 
generator. A test should therefore run the generator bit reversed through the 
test application. Here are the full forward and backward results ignoring the 
Diehard sums test:

RNG                     Bit-reversed    Dieharder       TestU01 (BigCrush)
JDK                     false           4,4,4,4,4       74,72,74,73,74    
JDK                     true            42,42,43,49,49  35,34,35,36,36    
WELL_512_A              false           0,0,0,0,0       7,6,6,6,6         
WELL_512_A              true            0,0,1,0,0       7,6,6,7,6         
WELL_1024_A             false           0,0,0,0,0       4,4,5,4,4         
WELL_1024_A             true            0,0,0,0,0       4,4,4,4,4         
WELL_19937_A            false           0,1,0,0,0       3,3,2,2,2         
WELL_19937_A            true            0,0,0,0,0       3,2,2,2,3         
WELL_19937_C            false           0,0,0,0,0       2,2,3,2,2         
WELL_19937_C            true            0,0,0,0,0       3,2,2,3,2         
WELL_44497_A            false           0,0,0,0,0       2,2,2,2,3         
WELL_44497_A            true            0,0,0,0,0       3,3,3,2,2         
WELL_44497_B            false           0,0,0,0,0       2,3,2,2,2         
WELL_44497_B            true            0,0,0,0,0       2,2,2,2,3         
MT                      false           0,0,0,0,0       2,3,2,2,2         
MT                      true            0,0,0,0,0       2,2,3,3,3         
ISAAC                   false           0,0,0,0,0       0,1,2,0,0         
ISAAC                   true            0,0,0,0,0       0,0,0,0,0         
SPLIT_MIX_64            false           0,0,0,0,0       1,0,0,0,0         
SPLIT_MIX_64            true            0,0,0,0,0       0,1,0,0,0         
XOR_SHIFT_1024_S        false           0,0,0,0,0       0,0,0,0,0         
XOR_SHIFT_1024_S        true            0,0,0,0,0       0,0,1,0,0         
TWO_CMRES               false           2,2,2,2,2       4,3,3,5,4         
TWO_CMRES               true            7,5,5,7,6       4,3,4,4,4         
MT_64                   false           0,0,0,0,0       2,3,2,2,2         
MT_64                   true            0,0,0,0,0       2,2,2,2,2         
MWC_256                 false           0,0,0,0,0       0,0,0,2,0         
MWC_256                 true            0,0,0,0,0       1,0,0,0,0         
KISS                    false           0,0,0,0,0       0,0,0,0,0         
KISS                    true            0,0,0,0,0       0,0,1,0,1         
XOR_SHIFT_1024_S_PHI    false           0,0,0,0,0       0,0,0,0,0         
XOR_SHIFT_1024_S_PHI    true            0,0,0,0,0       0,0,2,0,0         
XO_RO_SHI_RO_64_S       false           0,0,0,0,0       1,1,2,1,3         
XO_RO_SHI_RO_64_S       true            0,0,0,0,0       2,2,2,2,2         
XO_RO_SHI_RO_64_SS      false           0,0,0,0,0       0,0,0,0,0         
XO_RO_SHI_RO_64_SS      true            0,0,0,0,0       1,0,0,0,0         
XO_SHI_RO_128_PLUS      false           0,0,0,0,0       1,2,2,1,1         
XO_SHI_RO_128_PLUS      true            0,0,0,0,0       2,2,2,2,2         
XO_SHI_RO_128_SS        false           0,0,0,0,0       0,1,0,0,0         
XO_SHI_RO_128_SS        true            0,0,0,0,0       0,0,0,0,0         
XO_RO_SHI_RO_128_PLUS   false           0,0,0,0,0       0,1,0,0,0         
XO_RO_SHI_RO_128_PLUS   true            0,0,0,0,0       2,1,1,1,2         
XO_RO_SHI_RO_128_SS     false           0,0,0,0,0       1,0,1,0,0         
XO_RO_SHI_RO_128_SS     true            0,0,0,0,0       0,0,2,0,0         
XO_SHI_RO_256_PLUS      false           0,0,0,0,0       0,0,0,0,0         
XO_SHI_RO_256_PLUS      true            0,0,0,0,0       0,0,0,0,0         
XO_SHI_RO_256_SS        false           0,0,0,0,0       0,1,0,2,1         
XO_SHI_RO_256_SS        true            0,0,0,0,0       0,1,1,1,2         
XO_SHI_RO_512_PLUS      false           0,0,0,0,0       0,0,0,2,2         
XO_SHI_RO_512_PLUS      true            0,0,0,0,0       1,0,0,0,1         
XO_SHI_RO_512_SS        false           0,0,0,0,0       0,1,0,1,0         
XO_SHI_RO_512_SS        true            0,0,0,0,0       0,1,1,0,0 

So bit reversed the JDK is terrible at Dieharder. It actually improves for 
BigCrush from terrible to less terrible. TWO_CMRES is a bit worse when 
bit-reversed at Dieharder but no different at BigCrush (it was already 
systematically failing 3 tests).

All the other generators have similar results when bit reversed. So adding the 
bit-reversed results to the user-guide does not appear worthwhile. I will 
archive these and they can be added later if required, for example to show a 
good generator against a bad one. This will only be relevant if the library 
adds reference implementations of bad generators. Currently only the JDK is bad 
generator.

Next:

I have added a ‘results' command to the stress test application that can 
generate these results tables. It requires some header information not found in 
the old results files so only works with the new results. It can generate the 
APT table directly for the user guide. It will be useful going forward when 
more generators are added to update the results.

The new results are named using the test suite (dh_ or tu_), optionally the 
bit-reversed flag (r_), the enum ordinal and the trial run:

dh_1_1 = Dieharder for JDK trial 1
tu_1_1 = BigCrush for JDK trial 1
dh_r_2_3 = Dieharder bit reversed for WELL_512_A trial 3

I propose to:

- Delete all the old results and add these new ones using a new directory 
structure. All results can reside in a single directory.
- Ignore for now the bit-reversed results.
- Delete the old stress test code. The new code supersedes all functionality of 
the old version.
- Commit the new ‘results’ command when I have confirmed the APT table is 
correctly generated.

Questions:

1. Do we stick to using 3 trials or update to 5 (because I have the results)?
2. Do we remove the diehard_sums test result?

I would recommend removing diehard_sums. It pollutes the results for most 
generators with a spurious fail that should be ignored. So I think we should 
ignore it.

[rng] stress test results

Reply via email to