Yu, The controller is the broker that has the ActiveControllerCount jmx value of 1. At any point of time, only one broker in a Kafka cluster should have a value of 1 for this jmx mbean.
I personally find it very complex to find the replica fetcher thread's lag for a particular partition that is under replicated. I think we should have a tool that will take in a topic, partition and zookeeper url and give the lag for all the replicas for that partition. I will file a JIRA for this. Thanks, Neha On Wed, Aug 21, 2013 at 1:41 PM, Yu, Libo <libo...@citi.com> wrote: > > We have 3 brokers in our kafka cluster (1,2,3). Broker 2 somehow is not in > isr. > I restarted it and it did not help at all. And we notice in many case we > have to > restart the whole cluster to get it back. This is our top priority concern > currently. > > Here is the log after the restart: > > [2013-08-21 16:17:18,992] INFO Registered broker 2 at path /brokers/ids/2 > with > address xxxx:1234. (kafka.utils.ZkUtils$) > [2013-08-21 16:17:18,992] INFO [Kafka Server 2], Connecting to ZK: > xxxx:1234, yyyy:1234, zzzz:1234 > (kafka.server.KafkaServer) > [2013-08-21 16:17:19,061] INFO Will not load MX4J, mx4j-tools.jar is not > in th > e classpath (kafka.utils.Mx4jLoader$) > [2013-08-21 16:17:19,072] INFO conflict in /controller data: 2 stored > data: 3 > (kafka.utils.ZkUtils$) > [2013-08-21 16:17:19,082] INFO [Kafka Server 2], started > (kafka.server.KafkaSe > rver) > [2013-08-21 16:17:49,774] INFO Closing socket connection to /123.456.789. > (kafka.network.Processor) > ...... > > Regards, > > Libo > > > -----Original Message----- > From: Yu, Libo [ICG-IT] > Sent: Wednesday, August 21, 2013 3:15 PM > To: 'users@kafka.apache.org' > Subject: RE: How to get broker back to ISR > > Hi Neha, > > Which broker is controller broker and how is it defined? > > Regards, > > Libo > > > -----Original Message----- > From: Neha Narkhede [mailto:neha.narkh...@gmail.com] > Sent: Tuesday, August 20, 2013 10:56 AM > To: users@kafka.apache.org > Subject: Re: How to get broker back to ISR > > Once the broker is restarted, the controller broker will send it a list of > partitions that it should follow. The broker starts fetching from the > respective leaders and enters the ISR. Depending on the duration of > shutdown, the broker can take some time to enter ISR. > > Thanks, > Neha > On Aug 20, 2013 4:26 AM, "James Wu" <jameswu...@gmail.com> wrote: > > > Hi, > > > > I am wondering if my leader broker crash, how to get it back to ISR > > after restart kafak ? > > > > In the initial status the kafka-list-topic.sh shows: > > topic: failover-test partition: 0 leader: 0 replicas: 0,1 isr: 0,1 > > > > If I terminate the leader and kafka-list-topic.sh shows: > > topic: failover-test partition: 0 leader: 1 replicas: 0,1 isr: 1 > > > > > > Is there any document can explain what is the procedure to get my > > broker0 back to isr ? > > > > > > Thanks! > > > > -- > > -- > > Friendly regards, > > > > *James Wu <http://www.facebook.com/jameswu629> > > * > > >