On Wed, 3 Jun 2026 08:50:01 GMT, Per Minborg <[email protected]> wrote:

> ## Summary
> 
> This PR proposes to introduce a pooled confined arena as an optimization for 
> `Arena.ofConfined()`, where small native allocations can be served from a 
> reusable per-thread/per-slot memory pool instead of calling the regular 
> native allocator for every short-lived arena. The arena remains confined to 
> its owner thread and is still closed normally, but its backing storage can be 
> reset and reused when the arena closes. The feature requires no API changes.
> 
> ### Outline
> 
> Platform threads: one lazily allocated pool per Thread, encoded in 
> `Thread.confinedMemoryPool`.
> Virtual threads: fixed shared native pool with CAS-protected slots, because 
> per-virtual-thread native pools would not scale.
> 
> Pooled memory is zeroed out upon _closing_ an Arena to minimize data 
> visibility between reuse. This means the data is visible only within a TWR 
> block, and never outside it.
> 
> By default, a confined arena has access to 64 bytes of pooled data.  The pool 
> size is configurable via a system property and can be 8, 16, 32, or 64 bytes. 
> Pooling can also be turned off completely by setting the pool power-of-two 
> size to zero. Nested confined arenas are not supported
> 
> ## Static Analysis
> 
> An extensive static corpus analysis of third-party libraries and the JDK 
> itself has been conducted with respect to `Area.ofConfined()` usage, 
> revealing that confined arenas were used _only_ in TWR blocks and _never_ in 
> an unstructured way. The static analysis further revealed that in most cases, 
> only a small amount of native memory was ever allocated, usually less than 32 
> bytes, and in many cases, 8 bytes or less. This usage pattern lends itself 
> well to pooling. 
> 
> ## Dynamic Analysis
> 
> A dynamic statistical analysis of actual runs was also made, where various 
> properties of confined arenas were recorded and summarized during a complete 
> tier1 test run. While a tier1 run is not necessarily representative of a 
> typical application workload, it provided some interesting results:
> 
> The run produced 93 per-process histogram blocks and 788,773,092 closed 
> confined arenas. The result is dominated by arenas with no native allocation 
> at all: 375,934,768 arenas (47.661%) are in the zero-byte bucket. Counting 
> arenas up to 63 bytes covers 99.997% of all arena closures.
> 
> The largest count bucket is 8-15 bytes per arena with 400,951,293 arenas 
> (50.832% of all arenas). The largest byte bucket is 8-15 bytes per arena with 
> 3,207,623,039 B (3,059.03 MiB) (46.794% of all bytes). Buckets below 64 KiB 
> preserve very close t...

One possible concern here is with clients that expect `Arena::allocate` to 
result in a call to `malloc`. Some of these clients might expect to be able to 
override the system allocator -- e.g with jemalloc, to maybe take advantage of 
additional features such as use after free protection.

We have seen evidence of that here:
https://github.com/openjdk/jdk/pull/28235

The new implementation is a much more targeted cut, I like the direction.

src/java.base/share/classes/jdk/internal/foreign/ConfinedSession.java line 111:

> 109:             final long allocationByteSize = Math.max(1, byteSize);
> 110:             NativeMemorySegmentImpl segment;
> 111:             if (pool > 0 && (segment = trySlice(pool, 
> allocationByteSize, byteAlignment)) != null) {

I think here we can delay the memory segment creation even more -- what we need 
is something that gives us either a starting allocation address, or -1. If we 
get a starting address, then we can just do 
`MemorySegment.ofAddress(startAddress).reinterpret(arena, requestedSize)`

src/java.base/share/classes/jdk/internal/foreign/MemorySessionImpl.java line 
141:

> 139:         this.owner = owner;
> 140:         this.resourceList = resourceList;
> 141:         super();

why?

src/java.base/share/classes/jdk/internal/foreign/ThreadConfinedSegmentPool.java 
line 116:

> 114:     }
> 115: 
> 116:     final class CachedArena implements Arena, NoInitAllocator {

My general feeling here is that the implementation is arranged the wrong way. 
E.g. in my mind, we have ArenaImpl, which is the type of the builtin arena we 
return. And, if an ArenaImpl is confined, it can allocate memory more cheaply, 
with the help of some kind of thread-backed allocator.

I feel the right arrangement is to have a SegmentAllocator (not an Arena) that 
returns usable regions of memory from a given thread. Maybe the allocator is 
very low level, it computes the next pointer, and does a 
`MemorySegment.ofAddress(ptr)` for the region. Then the ArenaImpl::allocate 
takes that, and does a reinterpret with the correct arena and size. When the 
confined arena closes, the memory is returned to the underlying pool.

Since this is the builtin confined arena we're talking about, I'm not sure 
about CachedArena -- as that looks like any other 3rd party Arena. I think we 
can achieve tighter integration?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/31365#issuecomment-4611823247
PR Comment: https://git.openjdk.org/jdk/pull/31365#issuecomment-4660065460
PR Review Comment: https://git.openjdk.org/jdk/pull/31365#discussion_r3380808081
PR Review Comment: https://git.openjdk.org/jdk/pull/31365#discussion_r3380812287
PR Review Comment: https://git.openjdk.org/jdk/pull/31365#discussion_r3348040236

Reply via email to