leaderless topicparts after single node failure: how to repair?

Neil Harkins Tue, 09 Dec 2014 08:45:41 -0800

Hi. We've suffered a single node HW failure (broker_id 4)
with at least 2 replicas of each topic partition, but some
topic parts are now leaderless (all were across 4,5):


Topic: topic.with.two.replicas     Partition: 0    Leader: -1
Replicas: 4,5   Isr:

on broker 5, we see warnings like this in the logs:

/var/log/kafka/kafka.log.2:[2014-12-05 05:21:28,216] 19186668
[kafka-request-handler-4] WARN  kafka.server.ReplicaManager  -
[Replica Manager on Broker 5]: While recording the follower position,
the partition [topic.with.two.replicas,0] hasn't been created, skip
updating leader HW

/var/log/kafka/kafka.log.2:[2014-12-05 05:21:28,219] 19186671
[kafka-request-handler-4] WARN  kafka.server.KafkaApis  - [KafkaApi-5]
Fetch request with correlation id 36397 from client
ReplicaFetcherThread-1-5 on partition [topic.with.two.replicas,0]
failed due to Topic topic.with.two.replicas either doesn't exist or is
in the process of being deleted

We also have some topics which had 3 replicas also now leaderless:

Topic:topic.with.three.replicas PartitionCount:6 ReplicationFactor:3 Configs:
Topic: topic.with.three.replicas Partition: 0 Leader: none Replicas: 3,1,2 Isr:

whose 'state' in zookeeper apparently disappeared:
'/brokers/topics/topic.with.three.replicas/partitions/3/state':
NoNodeError((), {})

Our versions are:
kafka 0.8.1
zookeeper 3.4.5

>From searching archives of this list, the recommended "fix"
is to blow away the topic(s) and recreate. At this point in time,
that's an option, but it's not really acceptable for a reliable
data pipeline. Are there options to repair specific partitions?

-neil

leaderless topicparts after single node failure: how to repair?

Reply via email to