Hi Qi,

Just to confirm, as you seeing the offset reset to 0 in the new consumer?

I am not sure the root cause of the leader election failure. But as the new
Kafka consumer is storing the offset in Kafka, it is possible to reset the
offset to 0 for some topic partitions when the leader for a offset topic
partition becomes unavailable.

Thanks,
Liquan

On Wed, Apr 20, 2016 at 5:23 PM, Qi Xu <shkir...@gmail.com> wrote:

> Hi folks,
> Recently we run into an odd issue that some partition's latest offset
> becomes 0. Here's the snapshot of the Kafka Manager. As you can see
> partition 2 and 3 becomes zero.
>
> *Partition*
>
> *Latest Offset*
>
> *Leader*
>
> *Replicas*
>
> *In Sync Replicas*
>
> *Preferred Leader?*
>
> *Under Replicated?*
>
> 0
>
> 25822061
>
> 3 <http://10.1.49.4:9000/clusters/ppe/brokers/3>
>
> (3,4,5)
>
> (3,5,4)
>
> true
>
> false
>
> 1
>
> 25822388
>
> 4 <http://10.1.49.4:9000/clusters/ppe/brokers/4>
>
> (4,5,1)
>
> (4,1,5)
>
> true
>
> false
>
> 2
>
> 0
>
> 2 <http://10.1.49.4:9000/clusters/ppe/brokers/2>
>
> (5,1,2)
>
> (2)
>
> false
>
> true
>
> 3
>
> 0
>
> 2 <http://10.1.49.4:9000/clusters/ppe/brokers/2>
>
> (1,2,3)
>
> (3,2)
>
> false
>
> true
>
> In the Kafka Controller node, I saw there're some errors like below in
> state-change log. The timing seems match, not sure if it's related or not.
>
> [2016-04-14 19:59:21,800] ERROR Controller 3 epoch 74174 initiated state
> change for partition [topic,2] from OnlinePartition to OnlinePartition
> failed (state.change.logger)
> kafka.common.StateChangeFailedException: encountered error while electing
> leader for partition [topic,2] due to: Preferred replica 1 for partition
> [topic,2] is either not alive or not in the isr. Current leader and ISR:
> [{"leader":2,"leader_epoch":169,"isr":[2]}].
>
>
> And when this happens, basically all these partitions with zero latest
> offset fail to get new data. After we restart the controller, everything
> goes back normally.
>
> Do you see the similar issue before and any idea about the root cause? What
> other information do you suggest to collect to get to the root cause?
>
> Thanks,
> Qi
>



-- 
Liquan Pei
Software Engineer, Confluent Inc

Reply via email to