Re: How replicas catch up the leader

2015-03-11 Thread sy.pan
Hi, @Jiangjie Qin this is the related info from controller.log: [2015-03-11 10:54:11,962] ERROR [Controller 0]: Error completing reassignment of partition [ad_click_sts,3] (kafka.controller.KafkaController) kafka.common.KafkaException: Partition [ad_click_sts,3] to be reassigned is already assi

Re: How replicas catch up the leader

2015-03-10 Thread Jiangjie Qin
It looks that in your case it is because broker 1 somehow missed a controller LeaderAndIsrRequest for [ad_click_sts,4]. So the zkVersion would be different from the value stored in zookeeper from that on. Therefore broker 1 failed to update ISR. In this case you have to bounce broker to fix it. Fro

Re: How replicas catch up the leader

2015-03-10 Thread sy.pan
@tao xiao and Jiangjie Qin, Thank you very much I try to run kafka-reassign-partitions.sh, but the issue still exists… this the log info: [2015-03-11 11:00:40,086] ERROR Conditional update of path /brokers/topics/ad_click_sts/partitions/4/state with data {"controller_epoch":23,"leader":1,"ver

Re: How replicas catch up the leader

2015-03-10 Thread Jiangjie Qin
This looks like a leader broker somehow did not respond to a fetch request from the follower. It may be because the broker was too busy. If that is the case, Xiao¹s approach could help - reassign partitions or reelect leaders to balance the traffic among brokers. Jiangjie (Becket) Qin On 3/9/15,

Re: How replicas catch up the leader

2015-03-10 Thread tao xiao
I ended up running kafka-reassign-partitions.sh to reassign partitions to different nodes On Tue, Mar 10, 2015 at 11:31 AM, sy.pan wrote: > Hi, tao xiao and Jiangjie Qin > > I encounter with the same issue, my node had recovered from high load > problem (caused by other application) > > this is

Re: How replicas catch up the leader

2015-03-09 Thread sy.pan
Hi, tao xiao and Jiangjie Qin I encounter with the same issue, my node had recovered from high load problem (caused by other application) this is the kafka-topic show: Topic:ad_click_sts PartitionCount:6ReplicationFactor:2 Configs: Topic: ad_click_sts Partition: 0

Re: How replicas catch up the leader

2015-02-28 Thread Jiangjie Qin
Can you check if you replica fetcher thread is still running on broker 1? Also, you may check the public access log on broker 5 to see if there are fetch requests from broker 1. On 2/28/15, 12:39 AM, "tao xiao" wrote: >Thanks Harsha. In my case the replica doesn't catch up at all. the last >log

Re: How replicas catch up the leader

2015-02-28 Thread tao xiao
Thanks Harsha. In my case the replica doesn't catch up at all. the last log date is 5 days ago. It seems the failed replica is excluded from replication list. I am looking for a command that can add the replica back to the ISR list or force it to start sync-up again On Sat, Feb 28, 2015 at 4:27 PM

Re: How replicas catch up the leader

2015-02-28 Thread Harsha
you can increase num.replica.fetchers by default its 1 and also try increasing replica.fetch.max.bytes -Harsha On Fri, Feb 27, 2015, at 11:15 PM, tao xiao wrote: > Hi team, > > I had a replica node that was shutdown improperly due to no disk space > left. I managed to clean up the disk and restar

How replicas catch up the leader

2015-02-27 Thread tao xiao
Hi team, I had a replica node that was shutdown improperly due to no disk space left. I managed to clean up the disk and restarted the replica but the replica since then never caught up the leader shown below Topic:test PartitionCount:1 ReplicationFactor:3 Configs: Topic: test Partition: 0 Leade