Hi. We've suffered a single node HW failure (broker_id 4) with at least 2 replicas of each topic partition, but some topic parts are now leaderless (all were across 4,5):
Topic: topic.with.two.replicas Partition: 0 Leader: -1 Replicas: 4,5 Isr: on broker 5, we see warnings like this in the logs: /var/log/kafka/kafka.log.2:[2014-12-05 05:21:28,216] 19186668 [kafka-request-handler-4] WARN kafka.server.ReplicaManager - [Replica Manager on Broker 5]: While recording the follower position, the partition [topic.with.two.replicas,0] hasn't been created, skip updating leader HW /var/log/kafka/kafka.log.2:[2014-12-05 05:21:28,219] 19186671 [kafka-request-handler-4] WARN kafka.server.KafkaApis - [KafkaApi-5] Fetch request with correlation id 36397 from client ReplicaFetcherThread-1-5 on partition [topic.with.two.replicas,0] failed due to Topic topic.with.two.replicas either doesn't exist or is in the process of being deleted We also have some topics which had 3 replicas also now leaderless: Topic:topic.with.three.replicas PartitionCount:6 ReplicationFactor:3 Configs: Topic: topic.with.three.replicas Partition: 0 Leader: none Replicas: 3,1,2 Isr: whose 'state' in zookeeper apparently disappeared: '/brokers/topics/topic.with.three.replicas/partitions/3/state': NoNodeError((), {}) Our versions are: kafka 0.8.1 zookeeper 3.4.5 >From searching archives of this list, the recommended "fix" is to blow away the topic(s) and recreate. At this point in time, that's an option, but it's not really acceptable for a reliable data pipeline. Are there options to repair specific partitions? -neil