It looks like none of your replicas are in-sync. Did you enable unclean leader election? This will allow one of the un-synced replicas to become leader, leading to data loss but maintaining availability of the topic.
Gwen On Tue, Dec 9, 2014 at 8:43 AM, Neil Harkins <nhark...@gmail.com> wrote: > Hi. We've suffered a single node HW failure (broker_id 4) > with at least 2 replicas of each topic partition, but some > topic parts are now leaderless (all were across 4,5): > > Topic: topic.with.two.replicas Partition: 0 Leader: -1 > Replicas: 4,5 Isr: > > on broker 5, we see warnings like this in the logs: > > /var/log/kafka/kafka.log.2:[2014-12-05 05:21:28,216] 19186668 > [kafka-request-handler-4] WARN kafka.server.ReplicaManager - > [Replica Manager on Broker 5]: While recording the follower position, > the partition [topic.with.two.replicas,0] hasn't been created, skip > updating leader HW > > /var/log/kafka/kafka.log.2:[2014-12-05 05:21:28,219] 19186671 > [kafka-request-handler-4] WARN kafka.server.KafkaApis - [KafkaApi-5] > Fetch request with correlation id 36397 from client > ReplicaFetcherThread-1-5 on partition [topic.with.two.replicas,0] > failed due to Topic topic.with.two.replicas either doesn't exist or is > in the process of being deleted > > We also have some topics which had 3 replicas also now leaderless: > > Topic:topic.with.three.replicas PartitionCount:6 ReplicationFactor:3 > Configs: > Topic: topic.with.three.replicas Partition: 0 Leader: none Replicas: 3,1,2 > Isr: > > whose 'state' in zookeeper apparently disappeared: > '/brokers/topics/topic.with.three.replicas/partitions/3/state': > NoNodeError((), {}) > > Our versions are: > kafka 0.8.1 > zookeeper 3.4.5 > > From searching archives of this list, the recommended "fix" > is to blow away the topic(s) and recreate. At this point in time, > that's an option, but it's not really acceptable for a reliable > data pipeline. Are there options to repair specific partitions? > > -neil >