We are running a 3-node deployment of Kafka, and on several of our testing sites we have seen the following scenario occur:
- "auto.offset.reset" is set to "earliest" - A client is reading from Kafka, and at some point the broker throws an OffsetOutOfRangeException, causing the consumer to seek to the beginning of the partition. The client consumer is nowhere near the end of the partition at this point; there are thousands more messages after this offset. - At the same point in time, Kafka undergoes a leader transition It seems like the partition leader is incorrectly determining the length of the partition during this leader transition phase, which causes it to think that a valid offset is past the end of the partition. Is this a known issue? Log comparison: client.log 2016-08-10 08:06:22,612 [Client] INFO org.apache.kafka.clients.consumer.internals.Fetcher - Fetch offset 11891 is out of range, resetting offset kafka server.log 2016-08-10 08:06:20,713 INFO New leader is 1001 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)