On Wed, 18 Oct 2023 09:42:27 GMT, Jorn Vernee <jver...@openjdk.org> wrote:

>> Add the ability to pass heap segments to native code. This requires using 
>> `Linker.Option.critical(true)` as a linker option. It has the same 
>> limitations as normal critical calls, namely: upcalls into Java are not 
>> allowed, and the native function should return relatively quickly. Heap 
>> segments are exposed to native code through temporary native addresses that 
>> are valid for the duration of the native call.
>> 
>> The motivation for this is supporting existing Java array-based APIs that 
>> might have to pass multi-megabyte size arrays to native code, and are 
>> current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making 
>> a copy of the array would be overly prohibitive.
>> 
>> Components of this patch:
>> 
>> - New binding operator `SegmentBase`, which gets the base object of a 
>> `MemorySegment`.
>> - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether 
>> processing heap segments should be allowed.
>> - `CallArranger` impls use new binding operators when 
>> `Linker.Option.critical(/* allowHeap= */ true)` is specified.
>> - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures.
>> - The object/oop + offset is exposed as temporary address to native code.
>> - Since we stay in the `_thread_in_Java` state, we can safely expose the 
>> oops passed to the downcall stub to native code, without needing GCLocker. 
>> These oops are valid until we poll for safepoint, which we never do 
>> (invoking pure native code).
>> - Only x64 and AArch64 for now.
>> - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on 
>> callbacks to get the set of source and destination registers (using 
>> `CallingConventionClosure`), but instead just rely on 2 equal size arrays 
>> with source and destination registers. This allows filtering the input java 
>> registers before passing them to `ArgumentShuffle`, which is required to 
>> filter out registers holding segment offsets. Replacing placeholder 
>> registers is also done as a separate pre-processing step now. See changes 
>> in: 
>> https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866
>> - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to 
>> use a common `DowncallLinker::StubGenerator`.
>> - Fallback linker is also supported using JNI's 
>> `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical`
>> 
>> Aside: fixed existing issue with `DowncallLinker` not properly acquiring 
>> segments in interpreted mode.
>> 
>> Numbers for the included benchmark on my machine are:
>> 
>> 
>> Benchmar...
>
> Jorn Vernee has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Phrasing
>   
>   Co-authored-by: Maurizio Cimadamore 
> <54672762+mcimadam...@users.noreply.github.com>

Added another benchmark to the patch that xors 2 arrays together using various 
strategies. These are the results on my machine:


Benchmark         (arrayKind)  (sizeKind)  Mode  Cnt   Score    Error  Units
XorTest.xor      JNI_ELEMENTS       SMALL  avgt   30   0.555 �  0.010  ms/op
XorTest.xor      JNI_ELEMENTS      MEDIUM  avgt   30   4.610 �  0.114  ms/op
XorTest.xor      JNI_ELEMENTS       LARGE  avgt   30  53.533 �  2.113  ms/op
XorTest.xor        JNI_REGION       SMALL  avgt   30   0.030 �  0.001  ms/op
XorTest.xor        JNI_REGION      MEDIUM  avgt   30   1.498 �  0.041  ms/op
XorTest.xor        JNI_REGION       LARGE  avgt   30   7.544 �  0.188  ms/op
XorTest.xor      JNI_CRITICAL       SMALL  avgt   30   0.035 �  0.005  ms/op
XorTest.xor      JNI_CRITICAL      MEDIUM  avgt   30   0.496 �  0.003  ms/op
XorTest.xor      JNI_CRITICAL       LARGE  avgt   30   2.521 �  0.035  ms/op
XorTest.xor   FOREIGN_NO_INIT       SMALL  avgt   30   0.030 �  0.001  ms/op
XorTest.xor   FOREIGN_NO_INIT      MEDIUM  avgt   30   1.303 �  0.021  ms/op
XorTest.xor   FOREIGN_NO_INIT       LARGE  avgt   30   7.668 �  0.168  ms/op
XorTest.xor      FOREIGN_INIT       SMALL  avgt   30   0.031 �  0.001  ms/op
XorTest.xor      FOREIGN_INIT      MEDIUM  avgt   30   1.485 �  0.012  ms/op
XorTest.xor      FOREIGN_INIT       LARGE  avgt   30   9.183 �  0.247  ms/op
XorTest.xor  FOREIGN_CRITICAL       SMALL  avgt   30   0.026 �  0.001  ms/op
XorTest.xor  FOREIGN_CRITICAL      MEDIUM  avgt   30   0.501 �  0.002  ms/op
XorTest.xor  FOREIGN_CRITICAL       LARGE  avgt   30   2.578 �  0.023  ms/op
XorTest.xor            UNSAFE       SMALL  avgt   30   0.029 �  0.001  ms/op
XorTest.xor            UNSAFE      MEDIUM  avgt   30   1.300 �  0.013  ms/op
XorTest.xor            UNSAFE       LARGE  avgt   30   7.632 �  0.178  ms/op


The important part here is the `FOREIGN_CRITICAL` (the new feature) is on par 
with `JNI_CRITICAL`.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1768370164

Reply via email to