Hard to say, but if you have producers keeping producing data and they work well then probably you donĀ¹t need to.
On 4/21/15, 5:34 PM, "Wesley Chow" <w...@chartbeat.com> wrote: >There is only one broker that thinks it's the controller right now. The >double controller situation happened earlier this morning. Do the other >brokers have to be bounced after the controller situation is fixed? I did >not do that for all brokers. > >Wes > On Apr 21, 2015 8:25 PM, "Jiangjie Qin" <j...@linkedin.com.invalid> >wrote: > >> Yes, should be broker 25 thread 0 from the log. >> This needs to be resolved, you might need to bounce both of the brokers >> who think itself as controller respectively. The new controller should >>be >> able to continue the partition reassignment. >> >> From: Wes Chow <w...@chartbeat.com> >> Reply-To: "users@kafka.apache.org" <users@kafka.apache.org> >> Date: Tuesday, April 21, 2015 at 1:29 PM >> To: "users@kafka.apache.org" <users@kafka.apache.org> >> Subject: Re: partition reassignment stuck >> >> >> Quick clarification: you say broker 0, but do you actually mean broker >>25? >> 25 one of the replicas for the partition, is currently the one having >> trouble getting into sync, and 28 is the leader for the partition. >> >> Unfortunately, the logs of rotated off so I can't get to what happened >> around then. However there was a time period of a few hours where we had >> two brokers that both believed they were controllers. I'm not sure why I >> didn't think of this before. >> >> ZooKeeper data appears to be inconsistent at the moment. >> /brokers/topics/click_engage says that partition 116's replica set is: >>[4, >> 7, 25]. /brokers/topics/click_engage/partitions/116/state says the >>leader >> is 28 and the ISR is [28, 15]. Does this need to be resolved, and if so >>how? >> >> Thanks, >> Wes >> >> Jiangjie Qin <j...@linkedin.com.INVALID> >> April 21, 2015 at 2:24 PM >> This means that the broker 0 thought broker 28 was leader for that >> partition but broker 28 has actually already received StopReplicaRequest >> from controller and stopped serving as a replica for that partition. >> This might happen transiently but broker 0 will be able to find the new >> leader for the partition once it receive LeaderAndIsrRequest from >> controller to update the new leader information. If these messages keep >>got >> logged for long time then there might be an issue. >> Can you maybe check the timestamp around [2015-04-21 12:15:36,585] on >> broker 28 to see if there is some error log. The error log might not >>have >> partition info included. >> >> From: Wes Chow <w...@chartbeat.com> >> Reply-To: "users@kafka.apache.org" <users@kafka.apache.org> >> Date: Tuesday, April 21, 2015 at 10:50 AM >> To: "users@kafka.apache.org" <users@kafka.apache.org> >> Subject: Re: partition reassignment stuck >> >> >> Not for that particular partition, but I am seeing these errors on 28: >> >> kafka.common.NotAssignedReplicaException: Leader 28 failed to record >> follower 25's position 0 for partition [click_engage,116] since the >>replica >> 25 is not recognized to be one of the assigned r >> eplicas for partition [click_engage,116] >> at >> >>kafka.cluster.Partition.updateLeaderHWAndMaybeExpandIsr(Partition.scala:2 >>31) >> at >> >>kafka.server.ReplicaManager.recordFollowerPosition(ReplicaManager.scala:4 >>32) >> at >> >>kafka.server.KafkaApis$$anonfun$maybeUpdatePartitionHw$2.apply(KafkaApis. >>scala:460) >> at >> >>kafka.server.KafkaApis$$anonfun$maybeUpdatePartitionHw$2.apply(KafkaApis. >>scala:458) >> at >> scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:176) >> at >> >>scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:345) >> at >> >>scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:345) >> at >> kafka.server.KafkaApis.maybeUpdatePartitionHw(KafkaApis.scala:458) >> at >>kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:424) >> at kafka.server.KafkaApis.handle(KafkaApis.scala:186) >> at >> kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:42) >> at java.lang.Thread.run(Thread.java:745) >> >> What does this mean? >> >> Thanks! >> Wes >> >> >> Wes Chow <w...@chartbeat.com> >> April 21, 2015 at 1:50 PM >> >> Not for that particular partition, but I am seeing these errors on 28: >> >> kafka.common.NotAssignedReplicaException: Leader 28 failed to record >> follower 25's position 0 for partition [click_engage,116] since the >>replica >> 25 is not recognized to be one of the assigned r >> eplicas for partition [click_engage,116] >> at >> >>kafka.cluster.Partition.updateLeaderHWAndMaybeExpandIsr(Partition.scala:2 >>31) >> at >> >>kafka.server.ReplicaManager.recordFollowerPosition(ReplicaManager.scala:4 >>32) >> at >> >>kafka.server.KafkaApis$$anonfun$maybeUpdatePartitionHw$2.apply(KafkaApis. >>scala:460) >> at >> >>kafka.server.KafkaApis$$anonfun$maybeUpdatePartitionHw$2.apply(KafkaApis. >>scala:458) >> at >> scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:176) >> at >> >>scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:345) >> at >> >>scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:345) >> at >> kafka.server.KafkaApis.maybeUpdatePartitionHw(KafkaApis.scala:458) >> at >>kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:424) >> at kafka.server.KafkaApis.handle(KafkaApis.scala:186) >> at >> kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:42) >> at java.lang.Thread.run(Thread.java:745) >> >> What does this mean? >> >> Thanks! >> Wes >> >> >> Jiangjie Qin <j...@linkedin.com.INVALID> >> April 21, 2015 at 1:19 PM >> Those 00000000000000000000.index files are for different partitions >>and >> they should be generated if new replicas is assigned to the broker. >> We might want to know what caused the UnknownException. Did you see any >> error log on broker 28? >> >> Jiangjie (Becket) Qin >> >> >> Wes Chow <w...@chartbeat.com> >> April 21, 2015 at 12:16 PM >> I started a partition reassignment (this is a 8.1.1 cluster) some time >> ago and it seems to be stuck. Partitions are no longer getting moved >> around, but it seems like the cluster is operational otherwise. The >>stuck >> nodes have a lot of 00000000000000000000.index files, and their logs >>show >> errors like: >> >> [2015-04-21 12:15:36,585] 3237789 [ReplicaFetcherThread-0-28] ERROR >> kafka.server.ReplicaFetcherThread - [ReplicaFetcherThread-0-28], Error >>for >> partition [pings,227] to broker 28:class kafka.common.UnknownException >> >> I'm at a loss as to what I should be looking at. Any ideas? >> >> Thanks, >> Wes >> >>