Bill Burcham created GEODE-10122:
------------------------------------
Summary: With TLSv1.3 and GCM-based cipher (the default), P2P
Messaging Fails When Encrypted Data Limit is Reached
Key: GEODE-10122
URL: https://issues.apache.org/jira/browse/GEODE-10122
Project: Geode
Issue Type: Bug
Affects Versions: 1.14.3, 1.13.7, 1.15.0, 1.16.0
Reporter: Bill Burcham
Attachments: patch-P2PMessagingConcurrencyDUnitTest.txt
TLSv1.3 introduced [1] the ability to set per-algorithm limits on symmetric key
usage lifetimes. Once a certain number of bytes have been encrypted, a
KeyUpdate post-handshake message is sent.
With default settings, on Liberica JDK 11, Geode's P2P framework will negotiate
TLSv1.3 with the TLS_AES_256_GCM_SHA384 cipher suite. Geode P2P messaging will
eventually fail, with a "Tag mismatch!" IOException in shared ordered
receivers, after a session has been in heavy use for days.
We have not see this failure on TLSv1.2.
The implementation of TLSv1.3 in the Java runtime provides a security property
[2] to configure the encrypted data limit. The attached patch to
P2PMessagingConcurrencyDUnitTest configures the limit large enough that the
test makes it through the (P2P) TLS handshake but small enough so that the "Tag
mismatch!" exception is encountered less than a minute later.
The bug is caused by Geode’s NioSslEngine class’ ignorance of the
“rehandshaking” phase of the TLS protocol [3]:
Creation - ready to be configured.
Initial handshaking - perform authentication and negotiate communication
parameters.
Application data - ready for application exchange.
*Rehandshaking* - renegotiate communications parameters/authentication;
handshaking data may be mixed with application data.
Closure - ready to shut down connection.
Geode's tcp.Connection and NioSslEngine classes (particularly wrap() and
unwrap()), as they are currently implemented, fail to fully attend to the
handshake status from javax.net.ssl.SSLEngine. As a result these Geode classes
fail to respond to the KeyUpdate message, resulting in the "Tag mismatch!"
IOException.
When that exception is encountered, the Connection is destroyed and a new one
created in its place. But users of the old Connection, waiting for
acknowledgements, will never receive them. This can result in cluster-wide
hangs.
[1] [https://datatracker.ietf.org/doc/html/rfc8446#section-5.5]
[2]
[https://docs.oracle.com/en/java/javase/11/security/java-secure-socket-extension-jsse-reference-guide.html#GUID-B970ADD6-1E9F-4C18-A26E-0679B50CC946]
[3] [https://www.ibm.com/docs/en/sdk-java-technology/7.1?topic=sslengine-]
--
This message was sent by Atlassian Jira
(v8.20.1#820001)