On Sun, 14 Jul 2024 11:01:58 GMT, Uwe Schindler <uschind...@openjdk.org> wrote:
> I have one problem with the benchmark: I think it is not measuring the whole > setup in a way that is our workload: The basic problem is that we don't want > to deoptimize threads which are not related to MemorySegments. So basically, > the throughput of those threads should not be affected. For threads currently > in a memory-segment read it should have a bit of effect, but it should > recover fast. IMHO there is a bit of confusion in this discussion. When we say that a shared arena close operation is slow, we might mean one of two things: 1. calling the `close()` method itself is slow (this is what the benchmark effectively measures) 2. throughput of unrelated threads is affected (I think this is what Lucene is seeing) Addressing (2) than (1) (in the sense that, if you sign up for a shared arena close, you know it's going to be deterministic, but expensive, as the javadoc itself admits). For this reason, I'm unsure about some of the "delaying tactics" I see mentioned here: if we delay the underlying "free"/"unmap" operation, this is only going to affect (1). You still need some global operation (e.g. handshake) to make sure all threads agree on the segment state. Moving the cost of the free/unmap from one place to another is not really going to do much for (2). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20158#issuecomment-2228002760