Hi Qi, Just to confirm, as you seeing the offset reset to 0 in the new consumer?
I am not sure the root cause of the leader election failure. But as the new Kafka consumer is storing the offset in Kafka, it is possible to reset the offset to 0 for some topic partitions when the leader for a offset topic partition becomes unavailable. Thanks, Liquan On Wed, Apr 20, 2016 at 5:23 PM, Qi Xu <shkir...@gmail.com> wrote: > Hi folks, > Recently we run into an odd issue that some partition's latest offset > becomes 0. Here's the snapshot of the Kafka Manager. As you can see > partition 2 and 3 becomes zero. > > *Partition* > > *Latest Offset* > > *Leader* > > *Replicas* > > *In Sync Replicas* > > *Preferred Leader?* > > *Under Replicated?* > > 0 > > 25822061 > > 3 <http://10.1.49.4:9000/clusters/ppe/brokers/3> > > (3,4,5) > > (3,5,4) > > true > > false > > 1 > > 25822388 > > 4 <http://10.1.49.4:9000/clusters/ppe/brokers/4> > > (4,5,1) > > (4,1,5) > > true > > false > > 2 > > 0 > > 2 <http://10.1.49.4:9000/clusters/ppe/brokers/2> > > (5,1,2) > > (2) > > false > > true > > 3 > > 0 > > 2 <http://10.1.49.4:9000/clusters/ppe/brokers/2> > > (1,2,3) > > (3,2) > > false > > true > > In the Kafka Controller node, I saw there're some errors like below in > state-change log. The timing seems match, not sure if it's related or not. > > [2016-04-14 19:59:21,800] ERROR Controller 3 epoch 74174 initiated state > change for partition [topic,2] from OnlinePartition to OnlinePartition > failed (state.change.logger) > kafka.common.StateChangeFailedException: encountered error while electing > leader for partition [topic,2] due to: Preferred replica 1 for partition > [topic,2] is either not alive or not in the isr. Current leader and ISR: > [{"leader":2,"leader_epoch":169,"isr":[2]}]. > > > And when this happens, basically all these partitions with zero latest > offset fail to get new data. After we restart the controller, everything > goes back normally. > > Do you see the similar issue before and any idea about the root cause? What > other information do you suggest to collect to get to the root cause? > > Thanks, > Qi > -- Liquan Pei Software Engineer, Confluent Inc