Hello,

We have faced several times the deadlock in Kafka, the similar issue is - 
https://issues.apache.org/jira/browse/KAFKA-13544

The question - is it expected behavior that Kafka decided to shut down due to 
connectivity problems with Zookeeper?
Seems like it is related to the inability to read data from /feature Zk node 
and the ZooKeeperClientExpiredException thrown from ZooKeeperClient class. This 
exception is thrown and it is caught only in catch block of doWork() method in 
ChangeNotificationProcessorThread, and it leads to FatalExitError.

This problem is reproduced in the new versions of Kafka (which already have fix 
regarding deadlock).

It is hard to write a synthetic test to reproduce problem, but it can be 
reproduced locally via debug mode with the following steps:
1) Start Zookeeper and start Kafka in debug mode.
2) Emulate connectivity problem between Kafka and Zookeeper, for example 
connection can be closed via Netcrusher library.
3) Put a breakpoint in updateLatestOrThrow() method in FeatureCacheUpdater 
class, before zkClient.getDataAndVersion(featureZkNodePath) line execution.
4) Restore connection between Kafka and Zookeeper after session expiration. 
Kafka execution should be stopped on the breakpoint
5) Resume execution until Kafka starts to execute line 
zooKeeperClient.handleRequests(remainingRequests) in 
retryRequestsUntilConnected method in KafkaZkClient class.
6) Again emulate connectivity problem between Kafka and Zookeeper and wait 
until session will be expired.
7) Restore connection between Kafka and Zookeeper.
8) Kafka begins shutdown process, due to:
ERROR [feature-zk-node-event-process-thread]: Failed to process feature ZK node 
change event. The broker will eventually exit. 
(kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread)

The following problems on the real environment can be caused by some network 
problems and periodic disconnection and connection to the Zookeeper in a short 
time period.

So, the question - is it by design that Kafka begins shutdown process in such 
scenarios or it is a defect?

Regards,



________________________________
The information transmitted herein is intended only for the person or entity to 
which it is addressed and may contain confidential, proprietary and/or 
privileged material. Any review, retransmission, dissemination or other use of, 
or taking of any action in reliance upon, this information by persons or 
entities other than the intended recipient is prohibited. If you received this 
in error, please contact the sender and delete the material from any computer.

Reply via email to