Hi, During a round of kafka data discrepancy investigation I came across a bunch of recurring errors below:
producer.log > 2015-06-14 13:06:25,591 WARN [task-thread-9] > (k.p.a.DefaultEventHandler:83) - Produce request with correlation id 624 > failed due to [mytopic,21]: kafka.common.NotLeaderForPartitionException kafka.log [2015-06-14 13:05:13,025] 418953499 [request-expiration-task] WARN > kafka.server.ReplicaManager > - [Replica Manager on Broker 61]: Fetch request with correlation id 1 from > client fetchReq on partition [mytopic,21] failed due to Leader not local for > partition [mytopic,21] on broker 61 > > state-change.log > [2015-06-14 13:05:11,495] WARN Broker 29 ignoring LeaderAndIsr request > from controller 45 with correlation id 41799 epoch 27 for partition > [mytopic,21] since its associated leader epoch 191 is old. Current leader > epoch is 191 (state.change.logger) The warnings keep repeating several times during a day, and sometimes they coincide with timestamps of presumably missing records. As far as I understand occasional NotLeaderForPartitionException are fine, but does the same apply to "old leader epoch" warning? Could it be caused by any zk issue? However, I don't seem to find anything particularly interesting in zk logs, except "likely client has closed socket" or "Unexpected Exception: java.nio.channels.CancelledKeyException". The former is all over the log (must be the client issue) and the latter are rare and not correlated with the original warnings. Thanks, Here are some bits of configuration: Kafka 0.8.1.2, 3 brokers + 3 zk, 2x replication request.required.acks=1 retry.backoff.ms=1000