Jun Rao created KAFKA-7836:
------------------------------
Summary: The propagation of log dir failure can be delayed due to
slowness in closing the file handles
Key: KAFKA-7836
URL: https://issues.apache.org/jira/browse/KAFKA-7836
Project: Kafka
Issue Type: Improvement
Reporter: Jun Rao
In ReplicaManager.handleLogDirFailure(), we call zkClient.propagateLogDirEvent
after logManager.handleLogDirFailure. The latter closes the file handles of
the offline replicas, which could take time when the disk is bad. This will
delay the new leader election by the controller. In one incident, we have seen
the closing of file handles of multiple replicas taking more than 20 seconds.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)