I looked the latest code (Kafka 2.2.0) code in NetworkClient.java, I don't see any changes in the related code, I believe the problem also existed there.
Created this JIRA to track the issue: https://issues.apache.org/jira/browse/KAFKA-8089 On Thu, Mar 7, 2019 at 11:39 PM Ismael Juma <ism...@juma.me.uk> wrote: > Hi, > > It would be great to verify that this happens with Kafka 2.2.0 RC1. If it > does, then please file a JIRA so that this doesn't get lost. > > Ismael > > On Thu, Mar 7, 2019 at 4:19 PM Henry Cai <h...@pinterest.com.invalid> > wrote: > > > Hi, > > > > We have been using Kafka 2.0's mirror maker (which used High level > > consumer) to do replication. The topic is SSL enabled and the > certificate > > will expire at a random time within 12 hours. When the certificate > expired > > we will see many SSL related exception in the log > > > > [2019-03-07 18:02:54,128] ERROR [Consumer > > clientId=kafkamirror-euw1-use1-m10nkafka03-1, > > groupId=kafkamirror-euw1-use1-m10nkafka03] Connection to node 3005 failed > > authentication due to: SSL handshake failed > > (org.apache.kafka.clients.NetworkClient) > > > > > > This error will repeat for several hours. > > > > > > However even with the SSL error, the preexisting socket connection will > > still work so the main fetching activities is actually not affected, but > > the metadata operations from the client and the heartbeats from heartbeat > > thread will be affected since they might open new socket connections. I > > think those errors are most likely originated from those side activities. > > > > > > The situation will last several hours until the main fetcher thread tried > > to open a new connection (usually due to consumer rebalance) and then the > > SSL Authentication exception will abort the operation and mirror maker > will > > exit. > > > > > > During that several hours, the client wouldn't be able to get the latest > > metadata and heartbeats also falters (we see rebalancing triggered > because > > of this). > > > > > > In NetworkClient.processDisconnection(), when the above method prints the > > ERROR message, can it just throw the AuthenticationException up, this > will > > kill the KafkaConsumer.poll(), and this will speedup the certificate > > recycle (in our case, we will restart the mirror maker with the new > > certificate) > > >