I'm seeing a few (50+ in a couple of hours) warning messages like this 2015-10-30 06:22:11,086 WARN kafka.utils.Logging$class:83 [kafka-request-handler-0] [warn] Broker 175 ignoring LeaderAndIsr request from controller 175 with correlation id 18359 epoch 11 for partition [mytpoic,1337] since its associated leader epoch 6 is old. Current leader epoch is 6
This message confuses me, because the associated epoch and current leader epoch are both 6. My understanding was that a message should be processed if the request came from the current leader (or higher). Reviewing the code in ReplicaManager.scala:613 if (partitionLeaderEpoch < partitionStateInfo.leaderIsrAndControllerEpoch.leaderAndIsr.leaderEpoch) { if(partitionStateInfo.allReplicas.contains(config.brokerId)) partitionState.put(partition, partitionStateInfo) else { stateChangeLogger.warn(("Broker %d ignoring LeaderAndIsr request from controller %d with correlation id %d " + "epoch %d for partition [%s,%d] as itself is not in assigned replica list %s") .format(localBrokerId, controllerId, correlationId, leaderAndISRRequest.controllerEpoch, topic, partition.partitionId, partitionStateInfo.allReplicas.mkString(","))) } } else { // Otherwise record the error code in response stateChangeLogger.warn(("Broker %d ignoring LeaderAndIsr request from controller %d with correlation id %d " + "epoch %d for partition [%s,%d] since its associated leader epoch %d is old. Current leader epoch is %d") .format(localBrokerId, controllerId, correlationId, leaderAndISRRequest.controllerEpoch, topic, partition.partitionId, partitionStateInfo.leaderIsrAndControllerEpoch.leaderAndIsr.leaderEpoch, partitionLeaderEpoch)) responseMap.put((topic, partitionId), ErrorMapping.StaleLeaderEpochCode) } } I would have expected the first if condition, should be inverted to partitionStateInfo.leaderIsrAndControllerEpoch.leaderAndIsr.leaderEpoch < partitionLeadEpoch, so that it matches the other epoch checks within this class. The other epoch checks are for the controller not leader, but I suspect the logic that compares the epoch in the LeaderAndIsr request to the local known value of the epoch should do the same thing. I don't understand how this could be a bug, because it seems to be a critical piece of code - therefore what is the warning message trying to tell me that is wrong? Thanks, Jonathan