On Wed, 18 Oct 2023 04:44:26 GMT, Jorn Vernee <jver...@openjdk.org> wrote:
>> Add the ability to pass heap segments to native code. This requires using >> `Linker.Option.critical(true)` as a linker option. It has the same >> limitations as normal critical calls, namely: upcalls into Java are not >> allowed, and the native function should return relatively quickly. Heap >> segments are exposed to native code through temporary native addresses that >> are valid for the duration of the native call. >> >> The motivation for this is supporting existing Java array-based APIs that >> might have to pass multi-megabyte size arrays to native code, and are >> current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making >> a copy of the array would be overly prohibitive. >> >> Components of this patch: >> >> - New binding operator `SegmentBase`, which gets the base object of a >> `MemorySegment`. >> - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether >> processing heap segments should be allowed. >> - `CallArranger` impls use new binding operators when >> `Linker.Option.critical(/* allowHeap= */ true)` is specified. >> - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. >> - The object/oop + offset is exposed as temporary address to native code. >> - Since we stay in the `_thread_in_Java` state, we can safely expose the >> oops passed to the downcall stub to native code, without needing GCLocker. >> These oops are valid until we poll for safepoint, which we never do >> (invoking pure native code). >> - Only x64 and AArch64 for now. >> - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on >> callbacks to get the set of source and destination registers (using >> `CallingConventionClosure`), but instead just rely on 2 equal size arrays >> with source and destination registers. This allows filtering the input java >> registers before passing them to `ArgumentShuffle`, which is required to >> filter out registers holding segment offsets. Replacing placeholder >> registers is also done as a separate pre-processing step now. See changes >> in: >> https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 >> - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to >> use a common `DowncallLinker::StubGenerator`. >> - Fallback linker is also supported using JNI's >> `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` >> >> Aside: fixed existing issue with `DowncallLinker` not properly acquiring >> segments in interpreted mode. >> >> Numbers for the included benchmark on my machine are: >> >> >> Benchmar... > > Jorn Vernee has updated the pull request incrementally with one additional > commit since the last revision: > > drop unused in_reg_spiller test/micro/org/openjdk/bench/java/lang/foreign/CriticalCalls.java line 101: > 99: public int callNotPinned() throws Throwable { > 100: try (Arena arena = Arena.ofConfined()) { > 101: MemorySegment nativeArr = arena.allocate(JAVA_INT, > arr.length); This allocation method will zero the segment. I think that's a bit unfair and will bias the results against non-pinned, right? test/micro/org/openjdk/bench/java/lang/foreign/CriticalCalls.java line 110: > 108: public int callRecycled() throws Throwable { > 109: MemorySegment nativeArr = recycler.allocate(JAVA_INT, > arr.length); > 110: MemorySegment.copy(arr, 0, nativeArr, JAVA_INT, 0, arr.length); Same here - I believe a single call to the correct `allocate` method suffices, but in this case I don't think performance should be skewed too much. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1363578019 PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1363578910