On Thu, 15 Feb 2024 11:13:17 GMT, Daniel Jeliński <djelin...@openjdk.org> wrote:

> The reported leak was caused by the death of the `Cleanup-SunPKCS11` thread. 
> The cleanup thread in turn died because of an exception thrown from 
> `removeNativeKey` that resulted from 2 threads executing that method at the 
> same time.
> 
> This PR adds a reachabilityFence to ensure that the key will only be enqueued 
> for cleanup after the user thread is done with the `removeNativeKey` call.
> 
> No new regression test; the issue is extremely hard to reproduce in a 
> reasonable time. Existing tier1-3 tests continue to pass.
> 
> In JBS I attached a PoC patch that changes the relative timing of operations; 
> with that patch and without the changes from this PR I am able to reproduce 
> the issue within a few seconds. With the changes from this PR the issue did 
> not reproduce after 10 minutes of testing.

src/jdk.crypto.cryptoki/share/classes/sun/security/pkcs11/P11Key.java line 1537:

> 1535:                     this.ref.removeNativeKey();
> 1536:                     // prevent enqueuing SessionKeyRef until 
> removeNativeKey is done
> 1537:                     Reference.reachabilityFence(this);

The approach we are now taking is to put the reachabilityFence() call within 
the finally-clause of a try-finally statement. This ensures that all paths 
through the method will pass through the reachability fence, regardless of 
inlining or other JIT optimizations.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17870#discussion_r1493390418

Reply via email to