Upvote this problem. We are using the same version 0.8.2.0 and see a similar issue.
The stacktrace we have seen is [2015-09-18 08:57:47,147] ERROR [Replica Manager on Broker 58]: Error when processing fetch request for partition [topic1,22] offset 19068459 from follower with correlation id 234437982. Possible cause: Request for offset 19068459 but we only have log segments kafka.common.NotAssignedReplicaException: Leader 58 failed to record follower 31's position -1 since the replica is not recognized to be one of the assigned replicas 58 for partition [topic2,1] at kafka.server.ReplicaManager.updateReplicaLEOAndPartitionHW(ReplicaManager.scala:574) at kafka.server.KafkaApis$$anonfun$recordFollowerLogEndOffsets$2.apply(KafkaApis.scala:388) at kafka.server.KafkaApis$$anonfun$recordFollowerLogEndOffsets$2.apply(KafkaApis.scala:386) at scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245) at scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) On Tue, Aug 25, 2015 at 4:26 AM, Simon Cooper < simon.coo...@featurespace.co.uk> wrote: > This has happened again on the same system. Is anyone able to offer any > pointers towards a possible cause, as we have no idea what is wrong or how > to stop it happening again. > > We need to diagnose and fix this issue soon, as we can't have a live > system that randomly fails due to unknown causes! > > Thanks, > SimonC > > -----Original Message----- > From: Simon Cooper [mailto:simon.coo...@featurespace.co.uk] > Sent: 17 August 2015 12:31 > To: users@kafka.apache.org > Subject: Topic partitions randomly failed on live system > > Hi, > > We've had an issue on a live system (3 brokers, ~10 topics, some > replicated, some partitioned) where a partition wasn't properly reassigned, > causing several other partitions to go down. > > First, this exception happened on broker 1 (we weren't doing anything > particular on the system at the time): > > ERROR [AddPartitionsListener on 1]: Error while handling add partitions > for data path /brokers/topics/topic1 > (kafka.controller.PartitionStateMachine$AddPartitionsListener) > java.util.NoSuchElementException: key not found: [topic1,0] > at scala.collection.MapLike$class.default(MapLike.scala:228) > at scala.collection.AbstractMap.default(Map.scala:58) > at scala.collection.mutable.HashMap.apply(HashMap.scala:64) > at > kafka.controller.ControllerContext$$anonfun$replicasForPartition$1.apply(KafkaController.scala:112) > at > kafka.controller.ControllerContext$$anonfun$replicasForPartition$1.apply(KafkaController.scala:111) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at scala.collection.immutable.Set$Set1.foreach(Set.scala:74) > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at > scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:47) > at scala.collection.SetLike$class.map(SetLike.scala:93) > at scala.collection.AbstractSet.map(Set.scala:47) > at > kafka.controller.ControllerContext.replicasForPartition(KafkaController.scala:111) > at > kafka.controller.KafkaController.onNewPartitionCreation(KafkaController.scala:485) > at > kafka.controller.PartitionStateMachine$AddPartitionsListener$$anonfun$handleDataChange$1.apply$mcV$sp(PartitionStateMachine.scala:530) > at > kafka.controller.PartitionStateMachine$AddPartitionsListener$$anonfun$handleDataChange$1.apply(PartitionStateMachine.scala:519) > at > kafka.controller.PartitionStateMachine$AddPartitionsListener$$anonfun$handleDataChange$1.apply(PartitionStateMachine.scala:519) > at kafka.utils.Utils$.inLock(Utils.scala:535) > at > kafka.controller.PartitionStateMachine$AddPartitionsListener.handleDataChange(PartitionStateMachine.scala:518) > at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:547) > at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) > > At this point, broker 2 started continually spammed these messages > (mentioning other topics, not just topic1): > > ERROR [ReplicaFetcherThread-0-1], Error for partition [othertopic1,2] to > broker 1:class kafka.common.UnknownException > (kafka.server.ReplicaFetcherThread) > ERROR [ReplicaFetcherThread-0-1], Error for partition [othertopic2,0] to > broker 1:class kafka.common.UnknownException > (kafka.server.ReplicaFetcherThread) > ERROR [ReplicaFetcherThread-0-1], Error for partition [othertopic3,0] to > broker 1:class kafka.common.UnknownException > (kafka.server.ReplicaFetcherThread) > ERROR [ReplicaFetcherThread-0-1], Error for partition [topic1,0] to broker > 1:class kafka.common.UnknownException (kafka.server.ReplicaFetcherThread) > > And broker 1 had these messages, but only for topic1: > > ERROR [KafkaApi-1] error when handling request Name: FetchRequest; > Version: 0; CorrelationId: 41182755; ClientId: ReplicaFetcherThread-0-1; > ReplicaId: 2; MaxWait: 500 ms; MinBytes: 1 bytes; RequestInfo: [topic1,0] > -> PartitionFetchInfo(0,1048576) (kafka.server.KafkaApis) > kafka.common.NotAssignedReplicaException: Leader 1 failed to record > follower 2's position 0 since the replica is not recognized to be one of > the assigned replicas 1 for partition [topic1,0] > at > kafka.server.ReplicaManager.updateReplicaLEOAndPartitionHW(ReplicaManager.scala:574) > at > kafka.server.KafkaApis$$anonfun$recordFollowerLogEndOffsets$2.apply(KafkaApis.scala:388) > at > kafka.server.KafkaApis$$anonfun$recordFollowerLogEndOffsets$2.apply(KafkaApis.scala:386) > at > scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245) > at > scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245) > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) > at scala.collection.immutable.Map$Map1.foreach(Map.scala:109) > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) > at scala.collection.MapLike$MappedValues.foreach(MapLike.scala:245) > at > kafka.server.KafkaApis.recordFollowerLogEndOffsets(KafkaApis.scala:386) > at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:351) > at kafka.server.KafkaApis.handle(KafkaApis.scala:60) > at > kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:59) > at java.lang.Thread.run(Thread.java:745) > > At this time, any topic that had broker 1 as a leader were not working. ZK > thought that everything was ok and in sync. > > Restarting broker 1 fixed the broken topics for a bit, until broker 1 was > reassigned as leader of some topics, at which point it broke again. > Restarting broker 2 fixed it (!!!!). > > We're using kafka-2.10.0_0.8.2.0. Could anyone explain what happened, and > (most importantly) how we stop it happening again in the future? > > Many thanks, > SimonC > > -- Chen Song