I'm running into strange behavior when testing failure scenarios. I have 4 brokers and 8 partitions for a topic called "feed". I wrote a piece of code that prints out the partitionId, leaderId, and earliest offset for each partition.
Here is the printed information about partition leader earliest offsets: partition:0 leader:0 offset: 1676913 partition:1 leader:1 offset: 0 partition:2 leader:2 offset: 0 partition:3 leader:0 offset: 1676760 partition:4 leader:0 offset: 1676635 partition:5 leader:1 offset: 0 partition:6 leader:2 offset: 0 partition:7 leader:0 offset: 1676101 I then kill broker 0 (using kill <pid>) and re-run my program partition:0 leader:1 offset: 0 partition:1 leader:1 offset: 0 partition:2 leader:2 offset: 0 partition:3 leader:3 offset: 0 partition:4 leader:1 offset: 0 partition:5 leader:1 offset: 0 partition:6 leader:2 offset: 0 partition:7 leader:1 offset: 0 As you can see the leaders have changed where the leader was broker 0. However the earliest offset has also changed. I was under the impression that a replica must have the same offset range otherwise it would confuse the consumer of the partition. For example I run into an issue where during a failover test my consumer tries to request an offset into a partition on the new leader but the offset didn't exist (it was earlier than the earliest offset in that partition). Can anybody explain what is happening? Here is my code that prints the leader partition offset information: https://gist.github.com/lukeforehand/c37e22aea7192e00fff5 Thanks, Luke