I'm seeing a few (50+ in a couple of hours) warning messages like this

2015-10-30 06:22:11,086 WARN  kafka.utils.Logging$class:83
[kafka-request-handler-0] [warn] Broker 175 ignoring LeaderAndIsr request
from controller 175 with correlation id 18359 epoch 11 for partition
[mytpoic,1337] since its associated leader epoch 6 is old. Current leader
epoch is 6

This message confuses me, because the associated epoch and current leader
epoch are both 6. My understanding was that a message should be processed
if the request came from the current leader  (or higher).

Reviewing the code in ReplicaManager.scala:613

if (partitionLeaderEpoch <
partitionStateInfo.leaderIsrAndControllerEpoch.leaderAndIsr.leaderEpoch) {

            if(partitionStateInfo.allReplicas.contains(config.brokerId))

              partitionState.put(partition, partitionStateInfo)

            else {

              stateChangeLogger.warn(("Broker %d ignoring LeaderAndIsr
request from controller %d with correlation id %d " +

                "epoch %d for partition [%s,%d] as itself is not in
assigned replica list %s")

                .format(localBrokerId, controllerId, correlationId,
leaderAndISRRequest.controllerEpoch,

                topic, partition.partitionId,
partitionStateInfo.allReplicas.mkString(",")))

            }

          } else {

            // Otherwise record the error code in response

            stateChangeLogger.warn(("Broker %d ignoring LeaderAndIsr
request from controller %d with correlation id %d " +

              "epoch %d for partition [%s,%d] since its associated leader
epoch %d is old. Current leader epoch is %d")

              .format(localBrokerId, controllerId, correlationId,
leaderAndISRRequest.controllerEpoch,

              topic, partition.partitionId,
partitionStateInfo.leaderIsrAndControllerEpoch.leaderAndIsr.leaderEpoch,
partitionLeaderEpoch))

            responseMap.put((topic, partitionId),
ErrorMapping.StaleLeaderEpochCode)

          }

}

I would have expected the first if condition, should be inverted to
partitionStateInfo.leaderIsrAndControllerEpoch.leaderAndIsr.leaderEpoch <
partitionLeadEpoch, so that it matches the other epoch checks within this
class. The other epoch checks are for the controller not leader, but I
suspect the logic that compares the epoch in the LeaderAndIsr request to
the local known value of the epoch should do the same thing.

I don't understand how this could be a bug, because it seems to be a
critical piece of code - therefore what is the warning message trying to
tell me that is wrong?

Thanks,

Jonathan

Reply via email to