A few weeks ago, we rolled out TLS among hosts in our clusters (running 4.0.7). More recently we also rolled out TLS between Cassandra clients and the cluster. Today, we started seeing a lot of dropped actions in one cluster that correlate with warnings like this:
WARN [epollEventLoopGroup-5-31] 2023-04-12 15:43:34,476 PreV5Handlers.java:261 - Unknown exception in client networking io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: error:10000104:SSL routines:OPENSSL_internal:TOO_MANY_KEY_UPDATES at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:478) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: javax.net.ssl.SSLException: error:10000104:SSL routines:OPENSSL_internal:TOO_MANY_KEY_UPDATES at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.shutdownWithError(ReferenceCountedOpenSslEngine.java:1028) at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.sslReadErrorResult(ReferenceCountedOpenSslEngine.java:1321) at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1270) at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1346) at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1389) at io.netty.handler.ssl.SslHandler$SslEngineType$1.unwrap(SslHandler.java:206) at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1387) at io.netty.handler.ssl.SslHandler.decodeNonJdkCompatible(SslHandler.java:1294) at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1331) at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:508) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:447) ... 15 common frames omitted INFO [ScheduledTasks:1] 2023-04-12 15:46:19,701 MessagingMetrics.java:206 - READ_RSP messages were dropped in last 5000 ms: 0 internal and 3 cross node. Mean internal dropped latency: 0 ms and Mean cross-node dropped latency: 5960 ms This looks similar to a bug in OpenSSL fixed in 2019: https://github.com/openssl/openssl/pull/8299 but the equivalent change doesn't seem to have been ported over to BoringSSL. Has anyone else run across this, or have some sort of workaround? -- This email, including its contents and any attachment(s), may contain confidential and/or proprietary information and is solely for the review and use of the intended recipient(s). If you have received this email in error, please notify the sender and permanently delete this email, its content, and any attachment(s). Any disclosure, copying, or taking of any action in reliance on an email received in error is strictly prohibited.