I'm seeing a few (50+ in a couple of hours) warning messages like this
2015-10-30 06:22:11,086 WARN kafka.utils.Logging$class:83
[kafka-request-handler-0] [warn] Broker 175 ignoring LeaderAndIsr request
from controller 175 with correlation id 18359 epoch 11 for partition
[mytpoic,1337] since its associated leader epoch 6 is old. Current leader
epoch is 6
This message confuses me, because the associated epoch and current leader
epoch are both 6. My understanding was that a message should be processed
if the request came from the current leader (or higher).
Reviewing the code in ReplicaManager.scala:613
if (partitionLeaderEpoch <
partitionStateInfo.leaderIsrAndControllerEpoch.leaderAndIsr.leaderEpoch) {
if(partitionStateInfo.allReplicas.contains(config.brokerId))
partitionState.put(partition, partitionStateInfo)
else {
stateChangeLogger.warn(("Broker %d ignoring LeaderAndIsr
request from controller %d with correlation id %d " +
"epoch %d for partition [%s,%d] as itself is not in
assigned replica list %s")
.format(localBrokerId, controllerId, correlationId,
leaderAndISRRequest.controllerEpoch,
topic, partition.partitionId,
partitionStateInfo.allReplicas.mkString(",")))
}
} else {
// Otherwise record the error code in response
stateChangeLogger.warn(("Broker %d ignoring LeaderAndIsr
request from controller %d with correlation id %d " +
"epoch %d for partition [%s,%d] since its associated leader
epoch %d is old. Current leader epoch is %d")
.format(localBrokerId, controllerId, correlationId,
leaderAndISRRequest.controllerEpoch,
topic, partition.partitionId,
partitionStateInfo.leaderIsrAndControllerEpoch.leaderAndIsr.leaderEpoch,
partitionLeaderEpoch))
responseMap.put((topic, partitionId),
ErrorMapping.StaleLeaderEpochCode)
}
}
I would have expected the first if condition, should be inverted to
partitionStateInfo.leaderIsrAndControllerEpoch.leaderAndIsr.leaderEpoch <
partitionLeadEpoch, so that it matches the other epoch checks within this
class. The other epoch checks are for the controller not leader, but I
suspect the logic that compares the epoch in the LeaderAndIsr request to
the local known value of the epoch should do the same thing.
I don't understand how this could be a bug, because it seems to be a
critical piece of code - therefore what is the warning message trying to
tell me that is wrong?
Thanks,
Jonathan