Re: 0.8 best practices for migrating / electing leaders in failure situations?

Scott Clasen Fri, 22 Mar 2013 21:23:38 -0700

Thanks! 

 Would there be any difference if I instead  deleted all the Kafka data from 
zookeeper and booted 3 instances  with different broker id? clients with cached 
broker id lists or any other issue?


Sent from my iPhone

On Mar 22, 2013, at 9:15 PM, Jun Rao <jun...@gmail.com> wrote:

> In scenario 2, you can bring up 3 new brokers with the same broker id. You
> won't get the data back. However, new data can be published to and consumed
> from the new brokers.
> 
> Thanks,
> 
> Jun
> 
> On Fri, Mar 22, 2013 at 2:17 PM, Scott Clasen <sc...@heroku.com> wrote:
> 
>> Thanks Neha-
>> 
>> To Clarify...
>> 
>> *In scenario => 1 will the new broker get all messages on the other brokers
>> replicated to it?
>> 
>> *In Scenario 2 => it is clear that the data is gone, but I still need
>> producers to be able to send and consumers to receive on the same topic. In
>> my testing today I was unable to do that as I kept getting errors...so if i
>> was doing the correct steps it seems there is a bug here, basically the
>> "second-cluster-topic" topic is unusable after all 3 brokers crash, and 3
>> more are booted to replace them.  Something not quite correct in zookeeper?
>> 
>> Like so
>> 
>> ./bin/kafka-reassign-partitions.sh --zookeeper ... --path-to-json-file
>> reassign.json
>> 
>> kafka.common.LeaderNotAvailableException: Leader not available for topic
>> second-cluster-topic partition 0
>> at kafka.admin.AdminUtils$$anonfun$3.apply(AdminUtils.scala:120)
>> at kafka.admin.AdminUtils$$anonfun$3.apply(AdminUtils.scala:103)
>> at
>> 
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
>> at
>> 
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
>> at
>> 
>> scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61)
>> at scala.collection.immutable.List.foreach(List.scala:45)
>> at scala.collection.TraversableLike$class.map(TraversableLike.scala:206)
>> at scala.collection.immutable.List.map(List.scala:45)
>> at
>> 
>> kafka.admin.AdminUtils$.kafka$admin$AdminUtils$$fetchTopicMetadataFromZk(AdminUtils.scala:103)
>> at kafka.admin.AdminUtils$.fetchTopicMetadataFromZk(AdminUtils.scala:92)
>> at kafka.admin.ListTopicCommand$.showTopic(ListTopicCommand.scala:80)
>> at
>> 
>> kafka.admin.ListTopicCommand$$anonfun$main$2.apply(ListTopicCommand.scala:66)
>> at
>> 
>> kafka.admin.ListTopicCommand$$anonfun$main$2.apply(ListTopicCommand.scala:65)
>> at
>> 
>> scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61)
>> at scala.collection.immutable.List.foreach(List.scala:45)
>> at kafka.admin.ListTopicCommand$.main(ListTopicCommand.scala:65)
>> at kafka.admin.ListTopicCommand.main(ListTopicCommand.scala)
>> Caused by: kafka.common.LeaderNotAvailableException: No leader exists for
>> partition 0
>> at kafka.admin.AdminUtils$$anonfun$3.apply(AdminUtils.scala:117)
>> ... 16 more
>> topic: second-cluster-topic
>> 
>> ./bin/kafka-preferred-replica-election.sh  --zookeeper...
>> --path-to-json-file elect.json
>> 
>> 
>> ....[2013-03-22 10:24:20,706] INFO Created preferred replica election path
>> with { "partitions":[ { "partition":0, "topic":"first-cluster-topic" }, {
>> "partition":0, "topic":"second-cluster-topic" } ], "version":1 }
>> (kafka.admin.PreferredReplicaLeaderElectionCommand$)
>> 
>> ./bin/kafka-list-topic.sh  --zookeeper ... --topic second-cluster-topic
>> 
>> 2013-03-22 10:24:30,869] ERROR Error while fetching metadata for partition
>> [second-cluster-topic,0] (kafka.admin.AdminUtils$)
>> kafka.common.LeaderNotAvailableException: Leader not available for topic
>> second-cluster-topic partition 0
>> at kafka.admin.AdminUtils$$anonfun$3.apply(AdminUtils.scala:120)
>> at kafka.admin.AdminUtils$$anonfun$3.apply(AdminUtils.scala:103)
>> at
>> 
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
>> at
>> 
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
>> at
>> 
>> scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61)
>> at scala.collection.immutable.List.foreach(List.scala:45)
>> at scala.collection.TraversableLike$class.map(TraversableLike.scala:206)
>> at scala.collection.immutable.List.map(List.scala:45)
>> at
>> 
>> kafka.admin.AdminUtils$.kafka$admin$AdminUtils$$fetchTopicMetadataFromZk(AdminUtils.scala:103)
>> at kafka.admin.AdminUtils$.fetchTopicMetadataFromZk(AdminUtils.scala:92)
>> at kafka.admin.ListTopicCommand$.showTopic(ListTopicCommand.scala:80)
>> at
>> 
>> kafka.admin.ListTopicCommand$$anonfun$main$2.apply(ListTopicCommand.scala:66)
>> at
>> 
>> kafka.admin.ListTopicCommand$$anonfun$main$2.apply(ListTopicCommand.scala:65)
>> at
>> 
>> scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61)
>> at scala.collection.immutable.List.foreach(List.scala:45)
>> at kafka.admin.ListTopicCommand$.main(ListTopicCommand.scala:65)
>> at kafka.admin.ListTopicCommand.main(ListTopicCommand.scala)
>> Caused by: kafka.common.LeaderNotAvailableException: No leader exists for
>> partition 0
>> at kafka.admin.AdminUtils$$anonfun$3.apply(AdminUtils.scala:117)
>> ... 16 more
>> 
>> 
>> 
>> 
>> 
>> On Fri, Mar 22, 2013 at 1:12 PM, Neha Narkhede <neha.narkh...@gmail.com
>>> wrote:
>> 
>>> * Scenario 1:  BrokerID 1,2,3   Broker 2 dies.
>>> 
>>> Here, you can use reassign partitions tool and for all partitions that
>>> had a replica on broker 2, move it to broker 4
>>> 
>>> * Scenario 2: BrokerID 1,2,3 Catastrophic failure 1,2,3 die but ZK still
>>> there.
>>> 
>>> There is no way to recover any data here since there is nothing
>>> available to consume data from.
>>> 
>>> Thanks,
>>> Neha
>>> 
>>> On Fri, Mar 22, 2013 at 10:46 AM, Scott Clasen <sc...@heroku.com> wrote:
>>>> What would the recommended practice be for the following scenarios?
>>>> 
>>>> Running on EC2, ephemperal disks only for kafka.
>>>> 
>>>> There are 3 kafka servers. The broker ids are always increasing. If a
>>>> broker dies its never coming back.
>>>> 
>>>> All topics have a replication factor of 3.
>>>> 
>>>> * Scenario 1:  BrokerID 1,2,3   Broker 2 dies.
>>>> 
>>>> Recover by:
>>>> 
>>>> Boot another: BrokerID 4
>>>> ?? run bin/kafka-reassign-partitions.sh   for any topic+partition and
>>>> replace brokerid 2 with brokerid 4
>>>> ?? anything else to do to cause messages to be replicated to 4??
>>>> 
>>>> NOTE: This appears to work but not positive 4 got messages replicated
>> to
>>> it.
>>>> 
>>>> * Scenario 2: BrokerID 1,2,3 Catastrophic failure 1,2,3 die but ZK
>> still
>>>> there.
>>>> 
>>>> Messages obviously lost.
>>>> Recover to a functional state by:
>>>> 
>>>> Boot 3 more: 4,5 6
>>>> ?? run bin/kafka-reassign-partitions.sh  for all topics/partitions,
>> swap
>>>> 1,2,3 for 4,5,6?
>>>> ?? rin bin/kafka-preferred-replica-election.sh for all
>> topics/partitions
>>>> ?? anything else to do to allow producers to start sending
>> successfully??
>>>> 
>>>> 
>>>> NOTE: I had some trouble with scenario 2. Will try to reproduce and
>> open
>>> a
>>>> ticket, if in fact my procedures for scenario 2 are correct, and I
>> still
>>>> cant get to a good state.
>>

Re: 0.8 best practices for migrating / electing leaders in failure situations?

Reply via email to