On Tue, 4 Oct 2022 17:36:56 GMT, Chris Hennick <d...@openjdk.org> wrote:
>> This PR improves both the performance of `nextExponential` and >> `nextGaussian` and the distribution of output at the tails. It fixes the >> following imperfections: >> >> * Repeatedly adding DoubleZigguratTables.exponentialX0 to extra causes a >> rounding error to accumulate at the tail of the distribution (probably >> starting around `2*exponentialX0 == 0x1.e46eff20739afp3 ~ 15.1`); this PR >> fixes that by tracking the multiple of exponentialX0 as a long. (This >> distortion is worst when `x > 0x1.0p56` since in that case, a rounding error >> means `extra + x == extra`. >> * Reduces several equations using `Math.fma`. (This will almost certainly >> improve performance, and may or may not improve output distribution.) >> * Uses the newly-extracted `computeWinsorizedNextExponential` function to >> greatly reduce the probability that `nextGaussian` suffers from *two* tail >> cases of `nextExponential`. > > Chris Hennick has updated the pull request incrementally with one additional > commit since the last revision: > > Add parameter to enable/disable fixed PRNG seed JMH benchmark results (`make test TEST="micro:java.util.random"`) with `fixedSeed = true` on EC2 are below. Average time improves only for nextExponential() and only on non-Intel CPUs, but p100 improved more consistently (7 of the 12 tests improved by at least 1000ns, only 2 deteriorated by that amount.) # c6i.metal L64X128MixRandom.nextExponential() average ns/op: 26.984 ± 0.015 before, 34.914 ± 0.021 after L64X128MixRandom.nextExponential() p100 ns/op: 8224 before, 5624 after L64X1024MixRandom.nextExponential() average ns/op: 30.026 ± 0.020 before, 36.730 ± 0.024 after L64X1024MixRandom.nextExponential() p100 ns/op: 8736 before, 7464 after L64X128MixRandom.nextGaussian() average ns/op: 29.995 ± 0.017 before, 30.016 ± 0.019 after L64X128MixRandom.nextGaussian() p100 ns/op: 5392 before, 3024 after L64X1024MixRandom.nextGaussian() average ns/op: 31.394 ± 0.020 before, 31.660 ± 0.021 after L64X1024MixRandom.nextGaussian() p100 ns/op: 5984 before, 6072 after # c7g.16xlarge L64X128MixRandom.nextExponential() average ns/op: 41.006 ± 0.056 before, 40.288 ± 0.055 after L64X128MixRandom.nextExponential() p100 ns/op: 6384 before, 6416 after L64X1024MixRandom.nextExponential() average ns/op: 45.218 ± 0.071 before, 40.549 ± 0.062 after L64X1024MixRandom.nextExponential() p100 ns/op: 560 before, 3760 after L64X128MixRandom.nextGaussian() average ns/op: 39.643 ± 0.052 before, 40.473 ± 0.070 after L64X128MixRandom.nextGaussian() p100 ns/op: 7728 before, 1264 after L64X1024MixRandom.nextGaussian() average ns/op: 40.393 ± 0.061 before, 40.274 ± 0.061 after L64X1024MixRandom.nextGaussian() p100 ns/op: 9184 before, 1280 after # m6a.metal L64X128MixRandom.nextExponential() average ns/op: 31.241 ± 0.039 before, 29.004 ± 0.018 after L64X128MixRandom.nextExponential() p100 ns/op: 4224 before, 7224 after L64X1024MixRandom.nextExponential() average ns/op: 31.903 ± 0.017 before, 31.146 | ± | 0.021 | ns/op | L64X1024MixRandom.nextExponential() p100 ns/op: 9744 before, 8688 after L64X128MixRandom.nextGaussian() average ns/op: 29.164 ± 0.017 before, 29.073 ± 0.021 after L64X128MixRandom.nextGaussian() p100 ns/op: 5920 before, 1504 after L64X1024MixRandom.nextGaussian() average ns/op: 29.503 ± 0.022 before, 29.639 ± 0.018 after L64X1024MixRandom.nextGaussian() p100 ns/op: 4256 before, 4288 after ------------- PR: https://git.openjdk.org/jdk/pull/8131