Re: RFC: Untangle native libraries and the JVM: SVML, SLEEF, and libsimdsort

Vladimir Ivanov Tue, 10 Dec 2024 17:05:27 -0800

Thanks, Maurizio.

On 12/9/24 03:42, Maurizio Cimadamore wrote:

Great work Vlad!
The simdsort part seems a more "classic" FFM binding - where you have amethod handle per entry point. That seems to fit the design of FFMrather well. In the second case (SVML/SLEEF) usage of FFM is limited tobuild a "table of entry points" (e.g. we're just using SymbolLookup +MemorySegment here -- the invocation part is intrinsified as part of thenew VectorSupport methods).

I'd say that both simdsort and SVML/SLEEF cases are slightly off fromthe sweet spot FFM API is designed for since all 3 libraries heavilyrely on CPU dispatching.

If it helps, it might be possible to define a custom (JDK internal)family of value layouts for vector types. Then we could enhance theLinker classification to support such layouts. This means you could callinto native functions with vector parameters and return types using theLinker API more directly. Not sure if it will give you the sameperformance, but it's also an approach worth exploring.

FTR I experimented a bit with vector calling conventions support, but asVector API is implemented now, it introduced significant amount ofcomplexity on both sides, so I decided to keep vector intrinsics fornow. It already enables significant simplifications in Vector API.


Still, it would be convenient to eventually get vector support in FFM.

Re. support for custom calling conventions to call into hotspot stubsfrom Java, this might be possible - our story for supporting callingconventions other than the system calling convention is that thereshould be a dedicated linker instance per calling convention. So, if theJVM defines its own calling convention for its stubs there shouldprobably be a custom Linker implementation that is used to call intosuch stubs - which uses the machinery in the Linker implementation (e.g.Bindings) to classify the incoming function descriptors and determinethe shuffle sequence for a given particular call. This should all bedoable (at least inside the JDK) - it's just matter of "writing more code".


Interesting. Thanks for the details.

I agree with Paul that, as we move more stuff to use Panama, we willneed to look more at the avenues available to us to claim back some ofthe additional warm up cost introduced by the use of var/method handles.This is probably part of a bigger exploration on warmup and FFM.

In case of C2 intrinsics it may be less of an issue. Additional startupcosts may be quickly recuperated during warmup because optimizedimplementation is available earlier.


Best regards,
Vladimir Ivanov

On 06/12/2024 23:18, Vladimir Ivanov wrote:
Recently, a trend emerged to use native libraries to back intrinsicsin HotSpot JVM. SVML stubs for Vector API paved the road and it wassoon followed by SLEEF and simdsort libraries.
After examining their support, I must confess that it doesn't lookpretty. It introduces significant accidental complexity on JVM side.HotSpot has to be taught about every entry point in each library in anad-hoc manner. It's inherently unsafe, error-prone to implement andhard to maintain: JVM makes a lot of assumptions about an entry pointbased solely on its symbolic name and each library has its own namingconventions. Overall, current approach doesn't scale well.
Fortunately, new FFI API (java.lang.foreign) was finalized in 22. Itprovides enough functionality to interact with native libraries fromJava in performant manner.
I did an exercise to migrate all 3 libraries away from intrinsics andthe results look promising:
  simdsort: https://github.com/openjdk/jdk/pull/22621

  SVML/SLEEF: https://github.com/openjdk/jdk/pull/22619
As of now, java.lang.foreign lacks vector calling convention support,so the actual calls into SVML/SLEEF are still backed by intrinsics.But it still enables a major cleanup on JVM side.
Also, I coded library headers and used jextract to produce initiallibrary API sketch in Java and it worked really well. Eventually, itcan be incorporated into JDK build process to ensure the consistencybetween native and Java parts of library API.
Performance wise, it is on par with current (intrinsic-based)implementation.
One open question relates to CPU dispatching.
Each library exposes multiple functions with different requirementsabout CPU ISA extension support (e.g., no AVX vs AVX2 vs AVX512, NEONvs SVE). Right now, it's JVM responsibility, but once it gets out ofthe loop, the library itself should make the decision. I experimentedwith 2 approaches: (1) perform CPU dispatching with linking libraryfrom Java code (as illustrated in aforementioned PRs); or (2) callinto native library to query it about the right entry point [1] [2][3]. In both cases, it depends on additional API to sense the JVM/hardware capabilities (exposed on jdk.internal.misc.VM for now).
Let me know if you have any questions/suggestions/concerns. Thanks!

I plan to eventually start publishing PRs to upstream this work.

Best regards,
Vladimir Ivanov
[1] https://github.com/openjdk/jdk/commit/b6e6f2e20772e86fbf9088bcef01391461c17f11
[2] https://github.com/iwanowww/jdk/blob/09234832b6419e54c4fc182e77f6214b36afa4c5/src/java.base/share/classes/java/util/SIMDSortLibrary.java
[3] https://github.com/iwanowww/jdk/blob/09234832b6419e54c4fc182e77f6214b36afa4c5/src/java.base/linux/native/libsimdsort/simdsort.c

Re: RFC: Untangle native libraries and the JVM: SVML, SLEEF, and libsimdsort

Reply via email to