> On 19 Mar 2019, at 10:35, Gilles Sadowski <gillese...@gmail.com> wrote:
> 
>>> [...]
>>>> So leave the testing to just ints and document on the user guide that is
>>>> what we are testing.
>>> 
>>> +1
>> 
>> OK. That seems simplest.
>> 
>> Given all the stress tests will be rerun shall I go ahead and reorder the 
>> existing files, user guide .apt file and the GeneratorsList to be in the 
>> order of the RandomSource enum?
> 
> We could wait for the new results before updating the site.

I was going to rearrange it all and test all the links in the local site are 
all ok. I have this scripted but have not yet run it. When new results are 
ready they can be written over the existing ones. Either way I am fine. So 
let’s leave it until new results have been done and then check the site.

I will update the GeneratorsList to be autogenerated from the RandomSource enum.

> 
>> 
>> 
>> Big/Little Endian for Dieharder:
>> 
>> I’ve spent some time looking at the source code for Dieharder. It reads 
>> binary file data using this (taken from libdieharder/rng_file_input_raw.c):
>> 
>> unsigned int iret;
>> // ...
>> fread(&iret,sizeof(uint),1,state->fp);
>> 
>> So it reads single unsigned integers using fread().
>> 
>> Given that it is possible to run die harder using numbers from ascii and 
>> binary input files I set up a test. I created them using a RNG with the same 
>> seed with the standard output from a DataOutputStream and the byte reversed 
>> output using Integer.reverseBytes. Here’s what happens:
>> 
>>> dieharder -g 201 -d 0 -f raw.bin.rev
>>   diehard_birthdays|   0|       100|     100|0.89220858|  PASSED
>>> dieharder -g 202 -d 0 -f raw.txt
>>   diehard_birthdays|   0|       100|     100|0.89220858|  PASSED
>> 
>>> dieharder -g 201 -d 0 -f raw.bin
>>   diehard_birthdays|   0|       100|     100|0.30776452|  PASSED
>>> dieharder -g 202 -d 0 -f raw.txt.rev
>>   diehard_birthdays|   0|       100|     100|0.30776452|  PASSED
>> 
>>> cat raw.bin | dieharder -g 200 -d 0
>>   diehard_birthdays|   0|       100|     100|0.30776452|  PASSED
>> 
>> 
>> Note the reversed byte sequence (.rev suffix) is required to get the same 
>> results from the binary (.bin) file as from the text (.txt) file.
>> 
>> So the binary read of Dieharder is using the little endian representation, 
>> as was required for TestU01.
>> 
>> I had modified the stdin2testu01.c bridge to detect if the system was little 
>> endian and then correct the input data by reversing the bytes. It may be a 
>> better idea to write a test c program to detect the endianness of the system 
>> for reference. Then update the stress test benchmark to have an argument for 
>> little or big endian output when piping the int data to the command line 
>> program.
>> 
>> I think it is important to get the endianness of the data correct. At least 
>> for Dieharder it runs tests using tuples of bits from the data which can 
>> span multiple bytes. For example the sts_serial test (-d 102) uses 
>> overlapping n-tuples of bits with n from 1 to 16. Other tests using non 
>> overlapping tuples such as rgb_bitdist (-d 200) use n 1 to 12.
>> 
>> Reversing the bytes in the Java code is the easiest option.
> 
> +1
> [With an option flag for selecting whether the output should be BE or LE.]
> 

OK. I will consolidate all this and update the stress_test.md instructions to 
make it clear that endianness needs to be considered.

Should I add the raw data dumper to the source base? This runs a named 
RandomSource for a given number of iterations with a provided seed and outputs 
4 files: Dieharder text format and raw binary, with standard order and byte 
reversed. It may be useful if debugging the output of RNGs ever needs to be 
done again.

Alex


Reply via email to