One of the kafka brokers (broker 1) in our kafka cluster went down and we
did some reassignments to move partitions off the dead broker. There was
some problems in the reassignment and it brought down the broker (Broker
2)to which the older partitions were being assigned to.
On restarting the broker I see the following Exception in the server logs
afka.common.NotAssignedReplicaException: Leader 1 failed to record
follower 2's position 5637384 for partition [<topic>,13] since the
replica 2 is not recognized to be one of the assigned replicas for
partition [<topic>,13]
at
kafka.cluster.Partition.updateLeaderHWAndMaybeExpandIsr(Partition.scala:231)
at
kafka.server.ReplicaManager.recordFollowerPosition(ReplicaManager.scala:432)
at
kafka.server.KafkaApis$$anonfun$maybeUpdatePartitionHw$2.apply(KafkaApis.scala:460)
at
kafka.server.KafkaApis$$anonfun$maybeUpdatePartitionHw$2.apply(KafkaApis.scala:458)
at
scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:178)
at
scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:347)
at kafka.server.KafkaApis.maybeUpdatePartitionHw(KafkaApis.scala:458)
at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:424)
at kafka.server.KafkaApis.handle(KafkaApis.scala:186)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:42)
The following is the replica assignment for the partition
Topic Partition Leader Replicas ISRs
<topic> 13 3 [1, 2, 3] [3]
~~~~
As can be seen from the above data broker 3 is the leader for the
partition. But in the Exception message I see that broker 2s replica
fetcher still assumes broker 1 to be the leader.
Broker 1 was the first broker to go down. After that there was a
reassignment attempted to broker2 which failed. I understand that some
offset check pointing got messed up. Is there any way around it and have
any of you encountered `kafka.common.NotAssignedReplicaException` before.
Thanks,
Pradeep