On Sun, 14 Jul 2024 11:01:58 GMT, Uwe Schindler <uschind...@openjdk.org> wrote:

> I have one problem with the benchmark: I think it is not measuring the whole 
> setup in a way that is our workload: The basic problem is that we don't want 
> to deoptimize threads which are not related to MemorySegments. So basically, 
> the throughput of those threads should not be affected. For threads currently 
> in a memory-segment read it should have a bit of effect, but it should 
> recover fast.

IMHO there is a bit of confusion in this discussion. When we say that a shared 
arena close operation is slow, we might mean one of two things:

1. calling the `close()` method itself is slow (this is what the benchmark 
effectively measures)
2. throughput of unrelated threads is affected (I think this is what Lucene is 
seeing)

Addressing (2) than (1) (in the sense that, if you sign up for a shared arena 
close, you know it's going to be deterministic, but expensive, as the javadoc 
itself admits).

For this reason, I'm unsure about some of the "delaying tactics" I see 
mentioned here: if we delay the underlying "free"/"unmap" operation, this is 
only going to affect (1). You still need some global operation (e.g. handshake) 
to make sure all threads agree on the segment state. Moving the cost of the 
free/unmap from one place to another is not really going to do much for (2).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228002760

Reply via email to