Igor Soarez created KAFKA-15649:
-----------------------------------
Summary: Handle directory failure timeout
Key: KAFKA-15649
URL: https://issues.apache.org/jira/browse/KAFKA-15649
Project: Kafka
Issue Type: Sub-task
Reporter: Igor Soarez
If a broker with an offline log directory continues to fail to notify the
controller of either:
* the fact that the directory is offline; or
* of any replica assignment into a failed directory
then the controller will not check if a leadership change is required, and this
may lead to partitions remaining indefinitely offline.
KIP-858 proposes that the broker should shut down after a configurable timeout
to force a leadership change. Alternatively, the broker could also request to
be fenced, as long as there's a path for it to later become unfenced.
While this unavailability is possible in theory, in practice it's not easy to
entertain a scenario where a broker continues to appear as healthy before the
controller, but fails to send this information. So it's not clear if this is a
real problem.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)