Re: ISR not a replica

2015-07-10 Thread Guozhang Wang
OK, it seems your have a controller migration some time ago and the old controller (broker 0) did not de-register its listeners while its controller modules like "partition state machine" has been already shutdown. You can try to verify this through the active-controller metrics. If that is the ca

Re: ISR not a replica

2015-07-10 Thread Krishna Kumar
Yes, there were messages in the controller logs such as DEBUG [OfflinePartitionLeaderSelector]: No broker in ISR is alive for [topic1,2]. Pick the leader from the alive assigned replicas: (kafka.controller.OfflinePartitionLeaderSelector) ERROR [Partition state machine on Controller 0]: Error whil

Re: ISR not a replica

2015-07-10 Thread Guozhang Wang
Krish, If you only add a new broker (for example broker 3) into your cluster without doing anything else, this broker will not automatically get any topic-partitions migrated to itself, so I suspect there are at least some admin tools executed. The log exceptions you showed in the previous emails

Re: ISR not a replica

2015-07-10 Thread Krishna Kumar
So we think we have a process to fix this issue via ZooKeeper – If anyone has any thoughts, please let me know. First get the “state” from a good partition, to get the correct epochs: In /usr/local/zookeeper/zkCli.sh [zk: localhost:2181(CONNECTED) 4] get /brokers/topics/topic1/partitions/6/stat

Re: ISR not a replica

2015-07-09 Thread Krishna Kumar
Well, 3 (the new node) was shut down, so there were no messages there. “1" was the leader and we saw the messages on “0” and “2”. We managed to resolve this new problem to an extent by shutting down “1". We were worried because “1” was the only replica in the ISR. But once it went down, “0” and “2

Re: ISR not a replica

2015-07-09 Thread Guozhang Wang
Krish, Does broker 0 and 3 have the similar warn log entries as broker 2 for stale controller epochs? Guozhang On Thu, Jul 9, 2015 at 2:07 PM, Krishna Kumar wrote: > So we tried taking that node down. But that didn¹t fix the issue, so we > restarted the other nodes. > > This seems to have lead

Re: ISR not a replica

2015-07-09 Thread Krishna Kumar
So we tried taking that node down. But that didn¹t fix the issue, so we restarted the other nodes. This seems to have lead to 2 of other replicas dropping out of the ISIR for *all* topics. Topic: topic2 Partition: 0 Leader: 1 Replicas: 1,0,2 Isr: 1 Topic: topic2 Partition: 1

Re: ISR not a replica

2015-07-09 Thread Krishna Kumar
Thanks Guozhang We did do the partition-assignment, but against another topic, and that went well. But this happened for this topic without doing anything. Regards Krish On 7/9/15, 3:56 PM, "Guozhang Wang" wrote: >Krishna, > >Did you run any admin tools after adding the node (I assume it is n

Re: ISR not a replica

2015-07-09 Thread Guozhang Wang
Krishna, Did you run any admin tools after adding the node (I assume it is node 3), like partition-assignment? It is shown as the only one in ISR list but not in the replica list, which seems that the partition migration process was not completed. You can verify if this is the case by checking yo

ISR not a replica

2015-07-09 Thread Krishna Kumar
Hi We added a Kafka node and it suddenly became the leader and the sole replica for some partitions, but it is not in the ISR Any idea how we might be able to fix this? We are on Kafka 0.8.2 Topic: topic1 Partition: 0 Leader: 2 Replicas: 2,1,0 Isr: 2,0,1 Topic: topic1 Partitio