On Thu, 15 Feb 2024 11:13:17 GMT, Daniel Jeliński <djelin...@openjdk.org> wrote:
> The reported leak was caused by the death of the `Cleanup-SunPKCS11` thread. > The cleanup thread in turn died because of an exception thrown from > `removeNativeKey` that resulted from 2 threads executing that method at the > same time. > > This PR adds a reachabilityFence to ensure that the key will only be enqueued > for cleanup after the user thread is done with the `removeNativeKey` call. > > No new regression test; the issue is extremely hard to reproduce in a > reasonable time. Existing tier1-3 tests continue to pass. > > In JBS I attached a PoC patch that changes the relative timing of operations; > with that patch and without the changes from this PR I am able to reproduce > the issue within a few seconds. With the changes from this PR the issue did > not reproduce after 10 minutes of testing. src/jdk.crypto.cryptoki/share/classes/sun/security/pkcs11/P11Key.java line 1537: > 1535: this.ref.removeNativeKey(); > 1536: // prevent enqueuing SessionKeyRef until > removeNativeKey is done > 1537: Reference.reachabilityFence(this); The approach we are now taking is to put the reachabilityFence() call within the finally-clause of a try-finally statement. This ensures that all paths through the method will pass through the reachability fence, regardless of inlining or other JIT optimizations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17870#discussion_r1493390418