There is a race condition in JDK's SessionManager, which can lead to random 
exceptions.

**Exception:**

javax.net.ssl.SSLException: Internal error: close session with active objects
        at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:133)
        at 
java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:371)
        at 
java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:314)
        at 
java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:309)
        at 
java.base/sun.security.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1707)
        at 
java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1080)
        at 
java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:971)
        at SSLSocketServer.serverLoop(SSLSocketServer.java:133)
        at SSLSocketServer$1.run(SSLSocketServer.java:75)
        at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.security.ProviderException: Internal error: close session with 
active objects
        at 
jdk.crypto.cryptoki/sun.security.pkcs11.Session.close(Session.java:127)
        at 
jdk.crypto.cryptoki/sun.security.pkcs11.Session.close(Session.java:114)
        at 
jdk.crypto.cryptoki/sun.security.pkcs11.SessionManager.closeSession(SessionManager.java:237)
        at 
jdk.crypto.cryptoki/sun.security.pkcs11.SessionManager$Pool.release(SessionManager.java:270)
        at 
jdk.crypto.cryptoki/sun.security.pkcs11.SessionManager.demoteObjSession(SessionManager.java:210)
        at 
jdk.crypto.cryptoki/sun.security.pkcs11.Session.removeObject(Session.java:101)
        at 
jdk.crypto.cryptoki/sun.security.pkcs11.SessionKeyRef.updateNativeKey(P11Key.java:1396)
        at 
jdk.crypto.cryptoki/sun.security.pkcs11.SessionKeyRef.removeNativeKey(P11Key.java:1377)
        at 
jdk.crypto.cryptoki/sun.security.pkcs11.NativeKeyHolder.releaseKeyID(P11Key.java:1329)
        at 
jdk.crypto.cryptoki/sun.security.pkcs11.P11Key.releaseKeyID(P11Key.java:156)
        at 
jdk.crypto.cryptoki/sun.security.pkcs11.P11AEADCipher.reset(P11AEADCipher.java:529)
        at 
jdk.crypto.cryptoki/sun.security.pkcs11.P11AEADCipher.ensureInitialized(P11AEADCipher.java:436)
        at 
jdk.crypto.cryptoki/sun.security.pkcs11.P11AEADCipher.implDoFinal(P11AEADCipher.java:732)
        at 
jdk.crypto.cryptoki/sun.security.pkcs11.P11AEADCipher.engineDoFinal(P11AEADCipher.java:624)
        at java.base/javax.crypto.Cipher.doFinal(Cipher.java:2500)
        at 
java.base/sun.security.ssl.SSLCipher$T12GcmReadCipherGenerator$GcmReadCipher.decrypt(SSLCipher.java:1659)
        at 
java.base/sun.security.ssl.SSLSocketInputRecord.decodeInputRecord(SSLSocketInputRecord.java:260)
        at 
java.base/sun.security.ssl.SSLSocketInputRecord.decode(SSLSocketInputRecord.java:181)
        at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:111)
        at 
java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1508)
        at 
java.base/sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1479)
        at 
java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1064)
        ... 4 more


**Reproducibility:**
I started getting this exception quite reliably on JDK17 on my machine with one 
particular test setup using ssl-tests testsuite. Unfortunately setup itself 
needed some RH specific patches and also ability to reproduce depends on other 
factors such number keys in keystore, machine where testing was performed... I 
tried a bit to create some reproducer, but I couldn't find a way to easily 
reproduce this issue :(

**Problem:**
SunPKCS11 provider does session pooling. This is done in SessionManager [1] 
(one per SunPKCS11 provider). Released sessions are kept by SessionManager for 
a while, for reuse (in limited number). This however is a bit complicated as 
some sessions can own objects (e.g. keys). So there are actually 2 pools. One 
for sessions with objects ("objSessions") and one for sessions without objects 
("opSessions"). This is because sessions without objects, which are not being 
used, can be safely closed (SessionManager only keeps around limited amount of 
these), while sessions with objects cannot be safely closed (until all objects 
are removed from them). Session manager has methods for getting Session for 
given purpose (object creation or just doing other operations), prioritizing 
appropriate pool. Each session has counter (called "createdObjects") to track 
how many objects it owns. When session is being returned to pool this counter 
is checked and session is placed to appropriate pool. Also when 
 counter for some Session in "objSessions" pool reaches zero it is moved 
("demoted") to "opSessions" pool.

And here comes complicated part. As far as I understand it, Session.addObject() 
[2] (which increases "createdObjects" counter) is always being called by thread 
"holding" session which owns the created object. (That is: thread gets a 
session, uses it to create an object and calls Session.addObject() on that 
session to increase the counter, before returning the session to pool. See 
e.g.: [3]) However this is not true for Session.removeObject() [4]. (That is: 
thread gets session, which is not necessary the same one owning object being 
removed, performs object removal, but then calls Session.removeObject() on 
session which owned that object. See e.g.: [5]) That is Session.removeObject() 
can be called on Session which is in "objSessions" pool or which is being used 
be other thread. (object removal can happen as result of releasing key, either 
explicitly or as result of GC etc..). 

And finally, there is a problem in code handling object removal from a session. 
Session.removeObject() [4] first checks if "createdObjects" counter reached 
zero. If so, it calls SessionManager.demoteObjSession(this) [6], which attempts 
to remove Session from objSessions pool, if session is successfully removed 
from there, meaning no other thread "holds" this session, session is put to 
opSessions pool, if not (meaning other thread "holds" it), method just returns, 
since that other thread puts this session to appropriate pool, when it is done 
with it by calling SessionManager.releaseSession(session).

There is race condition here. Consider following scenario:

// Thread T1 runs:
Session.removeObject() // [4]
createdObjects.decrementAndGet() // returns zero

// Thread T2 steps in (operating on the same session instance):
Session.addObject() // increases "createdObjects" counter [2]
SessionManager.releaseSession(session) // releases session to objSessions pool

// Thread T1 continues:
SessionManager.demoteObjSession(this) // [6]
objSessions.remove(session) // returns true
opSessions.release(session)  // puts session (with objects!) to opSessions pool
// if opSessions is already full, close of session with objects is attempted 
throwing Exception..


**Fix:**
SessionManager.demoteObjSession [6] method was changed, so that check for 
objects is done once again if session was successfully removed from 
"objSessions" pool (now that it is out of pool and other threads should not be 
adding objects to it). Based on this check session is either released to 
"opSessions" pool or returned to "objSessions" pool. This can be achieved by 
calling releaseSession(session) instead of opSessions.release(session).

**Testing:**
jdk_security tests passed for me locally with this change.
I have also tested this change on top of custom JDK17 build which allows 
scenario, where I can reproduce this issue. Problem got fixed.

-------------

Commit messages:
 - Fix race condition in pkcs11 SessionManager

Changes: https://git.openjdk.org/jdk/pull/10125/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10125&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8293232
  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/10125.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/10125/head:pull/10125

PR: https://git.openjdk.org/jdk/pull/10125

Reply via email to