Hi,

We have been using Kafka 2.0's mirror maker (which used High level
consumer) to do replication.  The topic is SSL enabled and the certificate
will expire at a random time within 12 hours.  When the certificate expired
we will see many SSL related exception in the log

[2019-03-07 18:02:54,128] ERROR [Consumer
clientId=kafkamirror-euw1-use1-m10nkafka03-1,
groupId=kafkamirror-euw1-use1-m10nkafka03] Connection to node 3005 failed
authentication due to: SSL handshake failed
(org.apache.kafka.clients.NetworkClient)


This error will repeat for several hours.


However even with the SSL error, the preexisting socket connection will
still work so the main fetching activities is actually not affected, but
the metadata operations from the client and the heartbeats from heartbeat
thread will be affected since they might open new socket connections.  I
think those errors are most likely originated from those side activities.


The situation will last several hours until the main fetcher thread tried
to open a new connection (usually due to consumer rebalance) and then the
SSL Authentication exception will abort the operation and mirror maker will
exit.


During that several hours, the client wouldn't be able to get the latest
metadata and heartbeats also falters (we see rebalancing triggered because
of this).


In NetworkClient.processDisconnection(), when the above method prints the
ERROR message, can it just throw the AuthenticationException up, this will
kill the KafkaConsumer.poll(), and this will speedup the certificate
recycle (in our case, we will restart the mirror maker with the new
certificate)

Reply via email to