On Tue, 18 Feb 2025 14:33:16 GMT, Aleksey Shipilev <sh...@openjdk.org> wrote:
>> DirectByteBuffers are still using old `jdk.internal.ref.Cleaner` >> implementation. That implementation carries a doubly-linked list, and so >> makes DBB suffer from the same issue fixed for generic >> `java.lang.ref.Cleaner` users with >> [JDK-8343704](https://bugs.openjdk.org/browse/JDK-8343704). See the bug for >> the reproducer. >> >> We can migrate DBBs to use `java.lang.ref.Cleaner`. >> >> There are two pecularities during this rewrite. >> >> First, the old ad-hoc `Cleaner` implementation used to exit the VM when >> cleaning action failed. I presume it was to avoid memory leak / accidental >> reuse of the buffer. I moved the relevant block to `Deallocator` directly. >> Unfortunately, I cannot easily test it. >> >> Second is quite a bit hairy. Old DBB cleaning code was hooked straight into >> `Reference` processing loop. This was possible because we could infer that >> the weak references we are processing were DBB cleaning actions, since old >> `Cleaner` was the only use of this code. With standard `Cleaner`, we have >> lost this association, and so we cannot really do this from the reference >> processing loop. With the patched version, we now rely on normal `Cleaner` >> thread to do cleanups for us. >> >> Because of this, there is a new outpacing opportunity window where reference >> processing might have been over, but cleaner thread has not reacted yet. >> This is why we need another way to check progress that involves checking if >> cleaner has acted. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `java/nio java/io` >> - [x] Linux AArch64 server fastdebug, `java/nio java/io` >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one > additional commit since the last revision: > > A bit more comments I've made a prototype that adds Cleaner.waitForCleaning() and changes Bits.reserveMemory() to use it. https://github.com/openjdk/jdk/compare/master...kimbarrett:openjdk-jdk:wait-for-cleaning?expand=1 It's based on the shipilev commits 1-19, for ease of comparison and prototyping. There are also some shortcuts (like making some things public that likely shouldn't be), again for ease of prototyping. I ran the DirectBufferAllocTest test a few hundred times without any problems. I also ran it directly, configured with a 1/2 hour runtime, again without problems. I ran the two new micros against baseline, the shipilev changes only, and the waitForCleaning changes. Results below. DirectByteBufferGC baseline Benchmark (count) Mode Cnt Score Error Units DirectByteBufferGC.test 16384 avgt 9 4.036 ± 0.244 ms/op DirectByteBufferGC.test 65536 avgt 9 7.117 ± 0.184 ms/op DirectByteBufferGC.test 262144 avgt 9 18.482 ± 1.064 ms/op DirectByteBufferGC.test 1048576 avgt 9 194.590 ± 17.639 ms/op DirectByteBufferGC.test 4194304 avgt 9 802.354 ± 57.576 ms/op shipilev Benchmark (count) Mode Cnt Score Error Units DirectByteBufferGC.test 16384 avgt 9 4.104 ± 0.045 ms/op DirectByteBufferGC.test 65536 avgt 9 6.875 ± 0.244 ms/op DirectByteBufferGC.test 262144 avgt 9 17.943 ± 0.446 ms/op DirectByteBufferGC.test 1048576 avgt 9 59.002 ± 3.616 ms/op DirectByteBufferGC.test 4194304 avgt 9 331.328 ± 36.580 ms/op waitForCleaning Benchmark (count) Mode Cnt Score Error Units DirectByteBufferGC.test 16384 avgt 9 4.058 ± 0.167 ms/op DirectByteBufferGC.test 65536 avgt 9 6.775 ± 0.281 ms/op DirectByteBufferGC.test 262144 avgt 9 17.724 ± 0.602 ms/op DirectByteBufferGC.test 1048576 avgt 9 57.284 ± 2.679 ms/op DirectByteBufferGC.test 4194304 avgt 9 330.002 ± 44.200 ms/op DirectByteBufferChurn baseline Benchmark (recipFreq) Mode Cnt Score Error Units DirectByteBufferChurn.test 128 avgt 9 15.693 ± 0.991 ns/op DirectByteBufferChurn.test 256 avgt 9 11.255 ± 0.369 ns/op DirectByteBufferChurn.test 512 avgt 9 9.422 ± 0.382 ns/op DirectByteBufferChurn.test 1024 avgt 9 8.529 ± 0.311 ns/op DirectByteBufferChurn.test 2048 avgt 9 7.833 ± 0.287 ns/op shipilev Benchmark (recipFreq) Mode Cnt Score Error Units DirectByteBufferChurn.test 128 avgt 9 12.524 ± 0.476 ns/op DirectByteBufferChurn.test 256 avgt 9 9.248 ± 0.175 ns/op DirectByteBufferChurn.test 512 avgt 9 8.583 ± 0.197 ns/op DirectByteBufferChurn.test 1024 avgt 9 8.227 ± 0.274 ns/op DirectByteBufferChurn.test 2048 avgt 9 7.526 ± 0.394 ns/op waitForCleaning Benchmark (recipFreq) Mode Cnt Score Error Units DirectByteBufferChurn.test 128 avgt 9 11.235 ± 1.998 ns/op DirectByteBufferChurn.test 256 avgt 9 9.129 ± 0.324 ns/op DirectByteBufferChurn.test 512 avgt 9 8.446 ± 0.115 ns/op DirectByteBufferChurn.test 1024 avgt 9 7.786 ± 0.425 ns/op DirectByteBufferChurn.test 2048 avgt 9 7.477 ± 0.150 ns/op The baseline DirectByteBufferGC suffers significantly at higher counts, probably because of the long doubly-linked list of registered jdk.internal.ref.Cleaner objects. By eye the performance of both shipilev and waitForCleaning on the DBBChurn benchmark seem to be consistently slightly better than baseline, and within the noise of each other for both benchmarks. This waitForCleaning approach raises several questions. (1) Is the improved reliability of this approach worth the additional effort? My inclination is to say yes. But note that it should perhaps be different effort - see item (2). (2) Should Cleaner be responsible for providing the additional functionality? An alternative is to have Bits implement it's own cleaning, directly using a private PhantomReference derivative, ReferenceQueue, and Thread. If there were multiple (potential) clients for this sort of thing then putting it in Cleaner may makes sense, so long as their requirements are similar. But we're kind of fighting with it, esp. with the canary approach, but even with the waitForCleaning approach. I think things could be simpler and easier to understand with a direct implementation. And Cleaner was not intended to be the last word in the area of reference-based cleanup. Rather, it's a convenient package for addressing some common patterns (esp. with finalization). It provides some useful infrastructure that would need to be replaced in a bespoke implementation, but I don't think there's anything all that large or complicated there, and some streamlining is possible. Note that I haven't done any work in that direction yet. (3) Do we need the retry stuff, and how should it be done? I did some monitoring of the retries, and waitForCleaning DBBA does need the retries (though rarely more than 1, and I never saw more than 2). I kind of feel like the retries are a kludgy workaround for what seems like an unreasonable test, but oh well. The point of the retries is to allow other threads to drop some DBBs that can become reclaimable during the associated wait, giving the waiting thread an opportunity to make use of that now available memory. I experimented with a retry-style more similar to the current code, where each thread on the slow path will do at most one GC and then go into a wait/sleep loop if reservation continues to fail. (I added some locking to avoid the concurrent requests for GC issue with the current code.) That didn't seem to make any difference compared to something like the proposed serialized GC and sleep loop. I can construct theoretical cases that seem to favor either, but the style in the proposal seems simpler. Of course, the question of when to do GCs and how many to do is moot if -XX:+DisableExplicitGC is used. (The default for that option is false.) src/java.base/share/classes/java/nio/Bits.java line 149: > 147: // 1. GC needs to discover dead references and hand them over > to Reference > 148: // processing thread. This activity can be asynchronous and > can complete after > 149: // we unblock from System.gc(). Adding cleared references to the pending list is always completed before the GC invocation completes. Doing otherwise would break or significantly complicate Reference.waitForReferenceProcessing(), and to no good purpose. That function should only return false when all references cleared by a prior GC have been enqueued in their respective queues. There are tests that depend on that. (I looked, and so far as I can tell, none of the extant GCs violate this.) src/java.base/share/classes/java/nio/Bits.java line 166: > 164: // install the canary, call System.gc(), wait for canary to get > processed (dead). This > 165: // signals that since our last call to System.gc(), steps (1) > and (2) have finished, and > 166: // step (3) is currently in progress. The canary having been processed doesn't tell us anything definitive about step (2), because steps (2) and (3) occur concurrently. The reference processing thread transfers references to their respective queues, and the cleaner thread processes references that have been placed in its queue, both running at the same time. All we know is that the canary got transferred and cleaned. There may or many not have been other references similarly transferred and cleaned. There may or may not be more references in the pending list, in the cleaner queue, or both. That the canary has been cleaned doesn't give us any information about either the pending list or the cleaner queue. src/java.base/share/classes/java/nio/Bits.java line 172: > 170: // corner case would be handled with a normal retry attempt, > after trying to allocate. > 171: // If allocation succeeds even after partial cleanup, we are > done. If it does not, we get > 172: // to try again, this time reliably getting the results of the > first cleanup run. Not After trying again, all we know is that both the previous and the new canary have been processed. We don't know anything about other references, either from the latest or preceeding GCs. Consider this unfortunate scenario. The first canary ended up at the head of the pending list. The reference processing thread transfered it to the cleaner queue and then stalled. The cleaner thread processed the first canary. The waiting thread noted that, retried and failed the reservation, and retried the GC and wait for the canary. The retry canary also happened to end up at the front of the updated pending list. The reference processor transferred it and again stalled. The cleaner thread processed the retry canary. No real cleaning has happened, i.e. we have not reliably gotten the results of the first cleanup run. ------------- Changes requested by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22165#pullrequestreview-2635485651 PR Review Comment: https://git.openjdk.org/jdk/pull/22165#discussion_r1966720234 PR Review Comment: https://git.openjdk.org/jdk/pull/22165#discussion_r1966720361 PR Review Comment: https://git.openjdk.org/jdk/pull/22165#discussion_r1966720509