[
https://issues.apache.org/jira/browse/RNG-191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18061645#comment-18061645
]
Alex Herbert commented on RNG-191:
----------------------------------
Benchmarks added in RNG-188 show the Philox4x64 generator performance can be
significantly improved using the multiply methods in Math.
I modified the current core component that computes the unsigned multiply of
two longs and tested performance using the JMH module benchmarks.
There was inconsistent timing output from the LXM family of generators. These
generators have a linear congruential generator combined with a Xor based
generator (XBG). Only the update of the 128-bit LCG requires the
multiplication. The update of the XBG and computation of the mixed output from
both generators are independent. An efficient compiler and processor would be
able to pipeline these computations in parallel.
||Method||Description||
|original|The current RNG 1.7-SNAPSHOT code|
|um|Version modified to use java.lang.Math unsignedMultiplyHigh (java 18+);
multiplyHigh (java 9+); or default to the software implementation |
|um1|As um but using a variant for the order of the operations in the LXM
family of generators|
|um2|As um but using a variant for the order of the operations in the LXM
family of generators|
|um3|As um but using a variant for the order of the operations in the LXM
family of generators|
|um4|As um but using a variant for the order of the operations in the LXM
family of generators|
h2. Results
Processor: M2 Max; OpenJDK 17.0.17; 21.0.9. Results are in ns/operation. The
JMH baseline of 0.4ns has been subtracted from the timing results.
Next Double
| |17| | |21| | | | | |
|RNG|original|um|um1|original|um|um1|um2|um3|um4|
|L128_X1024_MIX|3.602|3.326|3.306|3.556|4.473|4.471|4.073|3.359|3.414|
|L128_X128_MIX|3.332|4.844|4.814|3.344|5.268|5.284|3.454|3.457|3.504|
|L128_X256_MIX|6.807|5.147|5.122|6.858|3.377|3.372|4.015|4.171|3.223|
|PHILOX_4X64|9.170|3.867|3.808|9.262|3.166|2.455|2.473|2.444|2.446|
Next Long
| |17| | |21| | | | | |
|RNG|original|um|um1|original|um|um1|um2|um3|um4|
|L128_X1024_MIX|3.344|5.289|5.404|3.453|5.007|5.052|3.440|3.235|3.337|
|L128_X128_MIX|3.315|3.345|3.511|3.330|4.104|4.145|3.427|4.812|4.918|
|L128_X256_MIX|3.279|4.922|4.999|3.286|4.556|4.596|3.822|4.470|4.854|
|PHILOX_4X64|9.250|3.633|3.672|9.186|2.162|2.189|2.210|2.169|2.226|
I only ran variants um2-4 on JDK 21 to determine if the change was impactful.
This was not the case.
Note that the next double output should be correlated with the next long
output. It uses a long and then performs a shift and conversion of 53-bits to a
double. So the time should be similar. However some results are clearly not
close, for example the L128_X256_MIX generator is much slower for double
generation using the original code than long generation. This inconsistency is
observed across the LXM family in various results where double generation can
be faster or slower than long generation. The JVM optimisations may be applied
differently to the two cases during the benchmark.
The Philox generator is very performance sensitive to the unsigned multiply
method. It is slow in the original code and increases in speed under JDK 17 and
then again under JDK 21. These results reflect the observations from RNG-188
when testing variants that use the Math functions.
The LXM generators L128_X128_MIX and L128_X1024_MIX are slowed down by use of
the Math multiply methods. Reasons for this are unclear. If the call to compute
the upper 128-bit multiplication result is faster then other optimisation made
by the JVM under this case lead to an inconsistent slowdown overall. The
L128_X256_MIX double generation improves under JDK 21 but not JDK 17. The
generation of long values is slower under JDK 17 and 21.
h2. Conclusion
Since the LXM family perform two independent update steps per output cycle, one
for the LCG and one for the XBG, it may be that increasing the speed of only
the LCG has no effect on overall runtime. These benchmarks show no significant
improvement in performance and often a negative slowdown of the generators.
Further testing is required using different hardware and JVMs, and possibly
different benchmarks such as use of the RNG with a random sampler to generate
output.
The Philox4x64 generator update step relies heavily on the 128-bit
multiplication. Performance gains are significant and consistent across
benchmark runs.
> Use java.lang.invoke to dynamically call Math multiply high methods
> -------------------------------------------------------------------
>
> Key: RNG-191
> URL: https://issues.apache.org/jira/browse/RNG-191
> Project: Commons RNG
> Issue Type: Improvement
> Components: core
> Reporter: Alex Herbert
> Priority: Minor
>
> The following generators rely on a 128-bit multiplication result from two
> 64-bit longs:
> * L128_X128_MIX
> * L128_X256_MIX
> * L128_X1024_MIX
> * PHILOX_4X64
> Java added support to java.lang.Math to compute the upper 64-bit result with
> potential intrinsic calls to native functions:
> ||Method||JDK||Notes||
> |[multiplyHigh (Javadoc JDK
> 21)|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Math.html#multiplyHigh(long,long)]|9|Convert
> to unsigned using:
> Math.multiplyHigh(a, b) + ((a >> 63) & b) + ((b >> 63) & a)|
> |[unsignedMultiplyHigh (Javadoc JDK
> 21)|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Math.html#unsignedMultiplyHigh(long,long)]|18|
> |
> Since Commons RNG targets Java 8 these methods cannot be used. The current
> RNGs use a software implementation to compose the upper bits of the 128-bit
> result. However the methods can be used with a
> [MethodHandle|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/invoke/MethodHandle.html]
> introduced in Java 7:
> {code:java}
> // find the method
> MethodHandle h = MethodHandles.publicLookup()
> .findStatic(Math.class,
> "unsignedMultiplyHigh",
> MethodType.methodType(long.class, long.class, long.class));
> // invoke (snippet)
> long a, b;
> try {
> long r = (long) h2.invokeExact(a, b);
> } catch (Throwable e) {
> throw new RuntimeException(e);
> }
> {code}
> Investigate the use of MethodHandle within the named RNGs.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)