Re: RFR: 8347408: Create an internal method handle adapter for system calls with errno [v16]

Matthias Ernst Thu, 16 Jan 2025 07:22:08 -0800

On Thu, 16 Jan 2025 11:58:20 GMT, Per Minborg <pminb...@openjdk.org> wrote:


>> Going forward, converting older JDK code to use the relatively new FFM API 
>> requires system calls that can provide `errno` and the likes to explicitly 
>> allocate a MemorySegment to capture potential error states. This can lead to 
>> negative performance implications if not designed carefully and also 
>> introduces unnecessary code complexity.
>> 
>> Hence, this PR proposes to add a _JDK internal_ method handle adapter that 
>> can be used to handle system calls with `errno`, `GetLastError`, and 
>> `WSAGetLastError`.
>> 
>> It currently relies on a thread-local cache of MemorySegments to allide 
>> allocations. If, in the future, a more efficient thread-associated 
>> allocation scheme becomes available, we could easily migrate to that one.
>> 
>> Here are some benchmarks:
>> 
>> 
>> Benchmark                                        Mode  Cnt   Score   Error  
>> Units
>> CaptureStateUtilBench.explicitAllocationFail     avgt   30  41.615 ? 1.203  
>> ns/op
>> CaptureStateUtilBench.explicitAllocationSuccess  avgt   30  23.094 ? 0.580  
>> ns/op
>> CaptureStateUtilBench.threadLocalFail            avgt   30  14.760 ? 0.078  
>> ns/op
>> CaptureStateUtilBench.threadLocalReuseSuccess    avgt   30   7.189 ? 0.151  
>> ns/op
>> 
>> 
>> Explicit allocation:
>> 
>>         try (var arena = Arena.ofConfined()) {
>>             return (int) HANDLE.invoke(arena.allocate(4), 0, 0);
>>         }
>> 
>> 
>> Thread Local (tl):
>> 
>>         return (int) ADAPTED_HANDLE.invoke(arena.allocate(4), 0, 0);
>> 
>> 
>> The graph below shows the difference in latency for a successful call:
>> 
>> ![image](https://github.com/user-attachments/assets/58fbef01-5d06-406c-87e6-75f468227fc6)
>> 
>> This is a ~3x improvement for both the happy and the error path.
>> 
>> 
>> Tested and passed tiers 1-3.
>
> Per Minborg has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Remove unused class

As mentioned in 
https://mail.openjdk.org/pipermail/panama-dev/2025-January/020882.html, I've 
been looking at pretty much exactly the same thing for another buffer required 
in the call sequence, with slightly different takeaways:
* the implementation in your PR always leaves ownership of the segments inside 
the thread-local. Borrowing/releasing threads need to check "can I take this",  
"am I still on the right carrier, trying to return this to the right slot?".  I 
find it easier to actually remove the elements from the cache slot, and 
potentially return them to the cache slot of a different CT. Ownership and 
responsibilities seem clearer to me this way.
* I've built on the assumption that the "CTL.get.test/modify" sequences are 
actually atomic wrt virtual-thread-scheduling (seeing NIO code that seems to 
rely on the same) and no mounting/unmounting will happen inside them (but _may_ 
happen on the acquire-invoke-release bracket; this I have actually observed to 
be the case between invoke and release).  If rescheduling can happen inside 
these sequences and we have multiple threads contending for the CT-local 
element, I'm not 100% sure that there is a point to using CTL in the first 
place?

src/java.base/share/classes/jdk/internal/foreign/CaptureStateUtil.java line 333:

> 331:             SegmentPool cache = TL_POOLS.get();
> 332:             if (cache == null) {
> 333:                 TL_POOLS.set(cache = new SegmentPool());

Why not use `initialValue`?

src/java.base/share/classes/jdk/internal/util/SingleElementPool.java line 124:

> 122:         public void close() {
> 123:             if (UNSAFE.compareAndSetInt(this, CLOSED_OFFSET, FALSE, 
> TRUE)) {
> 124:                 if (UNSAFE.getIntVolatile(this, 
> POOLED_ELEMENT_TAKEN_OFFSET) == FALSE) {

If we follow the premise that "any scheduling can happen", then this `close` 
could be racing with another thread trying to borrow the pooled element of this 
already-dead carrier thread:


VT@CT1:
   pool = tl.get()
   unmount
CT1 dies =>
  pool1.close
  taken == false =>  recycle
VT@CT2
  mount
  pool.take() succeeds since `taken` still false
  BOOM


I believe the close action would need to "take" the element before recycling it.
Wouldn't this obviate the need for a separate "closed" slot?

Again, I'm not sure whether this scheduling is actually possible at this time.

src/java.base/share/classes/jdk/internal/util/SingleElementPool.java line 132:

> 130:         @ForceInline
> 131:         private boolean takePooledElement() {
> 132:             return UNSAFE.getAndSetInt(this, 
> POOLED_ELEMENT_TAKEN_OFFSET, TRUE) == FALSE;

Does this need to use atomics? Looking at similar code in 
sun.nio.ch.Util.BufferCache which seems to be working without. My 
interpretation is that BufferCache relies on a virtual thread to not get 
preempted inside the { CTL.get().takeElement } sequence? Would the same 
reasoning not work here?

I.e.

  element = carrierThreadLocal.get() [1]
  if (element.x)                     [2]   a) can this virtual thread be moved 
to another carrier on [2] or [3]
    element.y                        [3]   b) can this virtual thread be 
suspended and another one schedule between [1] and [2]/[3]?


My understanding is that in the current implementation of virtual threads, this 
does not happen and we can manage the cache element with straight non-atomic 
code (correct)? If not correct, how does the BufferCache manage (pinned 
somewhere?)?

src/java.base/share/classes/jdk/internal/util/SingleElementPool.java line 137:

> 135:         @ForceInline
> 136:         private void releasePooledElement() {
> 137:             UNSAFE.putIntVolatile(this, POOLED_ELEMENT_TAKEN_OFFSET, 
> FALSE);

Similar to above: the use of volatile suggests that you're concerned about a 
virtual thread switch. Would be good to have clarification on this.

-------------

PR Review: https://git.openjdk.org/jdk/pull/22391#pullrequestreview-2556247903
PR Review Comment: https://git.openjdk.org/jdk/pull/22391#discussion_r1918610681
PR Review Comment: https://git.openjdk.org/jdk/pull/22391#discussion_r1918706463
PR Review Comment: https://git.openjdk.org/jdk/pull/22391#discussion_r1918637436
PR Review Comment: https://git.openjdk.org/jdk/pull/22391#discussion_r1918648090

Re: RFR: 8347408: Create an internal method handle adapter for system calls with errno [v16]

Reply via email to