Hello again,

- lrand48 (48 bits state as 3 uint16)        is 29 ops
  (10 =, 8 *, 7 +, 4 >>)

- xoshiro256** (256 bits states as 4 uint64) is 24 ops (18 if rot in hw)
  8 =, 2 *, 2 +, 5 <<, 5 ^, 2 |

Small benchmark on my laptop with gcc-7.3 -O3:

- pg_lrand48 takes 4.0 seconds to generate 1 billion 32-bit ints
- xoshiro256** takes 1.6 seconds to generate 1 billion 64-bit ints

These too-good-to-be-true figures have raised doubt in my mind, so after giving it some thought, I do not trust them: the function call is probably inlined and other optimizations are performed which would not apply in a more realistic case.

With -O2 it is 4.8 and 3.4 seconds, respectively.

I trust this one, which is reasonably consistent with the operation count.

More tests with "clang -O2" yielded 3.2 and 1.6 respectively, that I do not trust either.

I did separate compilation to prevent inlining and other undesirable optimizations: clang -Ofast or -O2 gives 5.2 and 3.9, gcc -Ofast gives 5.4 and 3.5.

It seems that clang is better at compiling pg_erand48 but gcc is better at xoroshi256**.

So significantly better speed _and_ quality are quite achievable.

xoroshi256** is about 1/3 faster at producing twice as much pseudo-randoms stuff, and it passed significant randomness tests that an LCG PRNG would not.

--
Fabien.

Reply via email to