I checked the max lag and it was 0. I grep state-change logs about topic-partition "[org.nginx,32]", and extract some related to broker 24 and broker 29 (controller switched from broker 24 to 29)
- on broker 29 (current controller): [2014-11-22 06:20:20,377] TRACE Controller 29 epoch 7 changed state of replica 29 for partition [org.nginx,32] from OnlineReplica to OnlineReplica (state.change.logger) *[2014-11-22 06:20:20,650] TRACE Controller 29 epoch 7 sending become-leader LeaderAndIsr request (Leader:29,ISR:29,24,LeaderEpoch:10,ControllerEpoch:4) with correlationId 0 to broker 29 for partition [org.nginx,32] (state.change.logger)* [2014-11-22 06:20:20,664] TRACE Broker 29 received LeaderAndIsr request (LeaderAndIsrInfo:(Leader:29,ISR:29,24,LeaderEpoch:10,ControllerEpoch:4),ReplicationFactor:2),AllReplicas:29,24) correlation id 0 from controller 29 epoch 7 for partition [org.nginx,32] (state.change.logger) *[2014-11-22 06:20:20,674] WARN Broker 29 received invalid LeaderAndIsr request with correlation id 0 from controller 29 epoch 7 with an older leader epoch 10 for partition [org.nginx,32], current leader epoch is 10 (state.change.logger)* [2014-11-22 06:20:20,912] TRACE Controller 29 epoch 7 sending UpdateMetadata request (Leader:29,ISR:29,24,LeaderEpoch:10,ControllerEpoch:4) with correlationId 0 to broker 23 for partition [org.nginx,32] (state.change.logger) *[2014-11-22 06:20:21,490] TRACE Controller 29 epoch 7 sending UpdateMetadata request (Leader:29,ISR:29,24,LeaderEpoch:10,ControllerEpoch:4) with correlationId 0 to broker 29 for partition [org.nginx,32] (state.change.logger)* *[2014-11-22 06:20:21,945] TRACE Broker 29 cached leader info (LeaderAndIsrInfo:(Leader:29,ISR:29,24,LeaderEpoch:10,ControllerEpoch:4),ReplicationFactor:2),AllReplicas:29,24) for partition [org.nginx,32] in response to UpdateMetadata request sent by controller 29 epoch 7 with correlation id 0 (state.change.logger)* [2014-11-22 06:20:28,703] TRACE Broker 29 received LeaderAndIsr request (LeaderAndIsrInfo:(Leader:29,ISR:29,LeaderEpoch:11,ControllerEpoch:6),ReplicationFactor:2),AllReplicas:29,24) correlation id 4897 from controller 24 epoch 6 for partition [org.nginx,32] (state.change.logger) [2014-11-22 06:20:28,703] WARN Broker 29 received LeaderAndIsr request correlation id 4897 with an old controller epoch 6. Latest known controller epoch is 7 (state.change.logger) *analysis:* controller 29 send become-leader LeaderAndIsr request with on old controller epoch, and broker 29 itself deem this request invalid, so do other brokers. And then controller 29 send updateMetadata request to all brokers and brokers cached leaderinfo with an old controller epoch. *question:* when the controller send become-leader LeaderAndIsr request or other updateMetadata request, will it check the current controller epoch and leader epoch ? It looks like the controller did not do any checking . Meanwhile, brokers will reject the LeaderAndIsr request with an old controller epoch, but will deal with updateMetadata request and cache it. - on broker 24 (previous controller) [2014-11-22 06:18:11,095] TRACE Controller 24 epoch 6 sending UpdateMetadata request (Leader:29,ISR:29,24,LeaderEpoch:10,ControllerEpoch:4) with correlationId 4886 to broker 36 for partition [org.nginx,32] (state.change.logger) [2014-11-22 06:20:17,553] TRACE Controller 24 epoch 6 sending UpdateMetadata request (Leader:29,ISR:29,24,LeaderEpoch:10,ControllerEpoch:4) with correlationId 4892 to broker 34 for partition [org.nginx,32] (state.change.logger) [2014-11-22 06:20:21,905] TRACE Controller 24 epoch 6 started leader election for partition [org.mobile_grouprecommend_userdeletedata,9] (state.change.logger) [2014-11-22 06:20:21,911] TRACE Controller 24 epoch 6 elected leader 21 for Offline partition [org.mobile_grouprecommend_userdeletedata,9] (state.change.logger) [2014-11-22 06:20:27,412] TRACE Controller 24 epoch 6 changed state of replica 24 for partition [org.nginx,32] from OnlineReplica to OfflineReplica (state.change.logger) *[2014-11-22 06:20:28,701] TRACE Controller 24 epoch 6 sending become-leader LeaderAndIsr request (Leader:29,ISR:29,LeaderEpoch:11,ControllerEpoch:6) with correlationId 4897 to broker 29 for partition [org.nginx,32] (state.change.logger)* [2014-11-22 06:20:28,713] TRACE Controller 24 epoch 6 sending UpdateMetadata request (Leader:29,ISR:29,LeaderEpoch:11,ControllerEpoch:6) with correlationId 4897 to broker 23 for partition [org.nginx,32] (state.change.logger) *analysis:* controller 24 and controller 29 were alive together, controller 24 send become-leader LeaderAndIsr request to broker 29, and broker 29 found it had an old controllerEpoch and did not process. *question:* can two controllers live together ? I think it should not happen. controller 24 and controller 29 send LeaderAndIsr to other brokers. While controller 24 has a newer LeaderEpoch (LeaderAndIsrInfo:(Leader:29,ISR:29,LeaderEpoch:11,ControllerEpoch:6)), controller 29 has a newer controller epoch(epoch 7) but with the old LeaderAndIsrInfo. Brokers will reject LeaderAndIsr requests with old controller epoch or old leader epoch. So neither of the two controllers can update the LeaderAndIsrInfo. I wonder why the newer controller won't update the LeaderAndIsrInfo and when will it be updated? In such case, how to resolve ? *some other info* - *other brokers will reject the old LeaderAndIsr request* [2014-11-22 06:20:28,701] TRACE Broker 20 received LeaderAndIsr request (LeaderAndIsrInfo:(Leader:20,ISR:20,LeaderEpoch:20,ControllerEpoch:6),ReplicationFactor:2),AllReplicas:20,24) correlation id 4897 from controller 24 epoch 6 for partition [org.mobile_pagetracklog,32] (state.change.logger) [2014-11-22 06:20:28,701] WARN Broker 20 received LeaderAndIsr request correlation id 4897 with an old controller epoch 6. Latest known controller epoch is 7 (state.change.logger) - older controller will finally failed when writing to zookeeper if conflict with the newer controller, but I don't know when it will terminate the thread. [2014-11-22 06:20:28,003] ERROR Controller 24 epoch 6 initiated state change of replica 24 for partition [binlog.newsletter_binlog,12] from OnlineReplica to OfflineReplica failed (state.change.logger) kafka.common.StateChangeFailedException: Leader and isr path written by another controller. This probablymeans the current controller with epoch 6 went through a soft failure and another controller was elected with epoch 7. Aborting state change by this controller at kafka.controller.KafkaController.removeReplicaFromIsr(KafkaController.scala:967) at kafka.controller.ReplicaStateMachine.handleStateChange(ReplicaStateMachine.scala:232) at kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:96) at kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:96) at scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153) at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) at kafka.controller.ReplicaStateMachine.handleStateChanges(ReplicaStateMachine.scala:96) at kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:438) On Sun, Nov 30, 2014 at 12:22 AM, Jun Rao <jun...@gmail.com> wrote: > Could you check the state-change log of the follower replica and see if it > received the corresponding LeaderAndIsr request? If so, could you check the > max lag jmx (http://kafka.apache.org/documentation.html) in the follower > replica to see what the lag is? > > Thanks, > > Jun > > On Thu, Nov 27, 2014 at 4:03 AM, Shangan Chen <chenshangan...@gmail.com> > wrote: > > > my kafka version is kafka_2.10-0.8.1.1.jar > > > > *state-change log:* > > > > [2014-11-25 02:30:19,290] TRACE Controller 29 epoch 7 sending > > UpdateMetadata request > > (Leader:29,ISR:29,24,LeaderEpoch:10,ControllerEpoch:4) with correlationId > > 1803 to broker 20 for partition [org.nginx,32] (state.change.logger) > > > > *controller log:* > > > > [2014-11-22 09:17:02,327] [org.nginx,32] -> > > (Leader:29,ISR:29,24,LeaderEpoch:10,ControllerEpoch:4) > > > > *partition state in zookeeper:* > > > > [zk: localhost:2181(CONNECTED) 4] get > > /kafka08/brokers/topics/org.nginx/partitions/32/state > > > {"controller_epoch":6,"leader":29,"version":1,"leader_epoch":11,"isr":[29]} > > cZxid = 0x5641824ee > > ctime = Fri Oct 10 12:53:47 CST 2014 > > mZxid = 0x5a4c870b8 > > mtime = Sat Nov 22 06:20:27 CST 2014 > > pZxid = 0x5641824ee > > cversion = 0 > > dataVersion = 19 > > aclVersion = 0 > > ephemeralOwner = 0x0 > > dataLength = 75 > > numChildren = 0 > > > > > > Based on the above information, controller and state change log has the > > right information, but partition state in zookeeper was not updated and > > never try to update. > > > > > > > > > > On Tue, Nov 25, 2014 at 1:28 PM, Jun Rao <jun...@gmail.com> wrote: > > > > > Which version of Kafka are you using? Any error in the controller and > the > > > state-change log? > > > > > > Thanks, > > > > > > Jun > > > > > > On Fri, Nov 21, 2014 at 5:59 PM, Shangan Chen < > chenshangan...@gmail.com> > > > wrote: > > > > > > > In the initial state all replicas are in isr list, but sometimes > when I > > > > check the topic state, the replica can never become isr even if > > actually > > > it > > > > is synchronized. I saw in the log, the leader print expand isr > > > request,but > > > > did not work. I found a interesting thing, the shrink and expand > > request > > > > happened just after the controller switch. I don't know whether it is > > > > related, and the controller log is overwrite, so I can not verify. Is > > > there > > > > anything I can do to trigger the isr update? Currently, I alter the > > > > zookeeper partition state, and it works, but it really need a lot of > > > manual > > > > work to do as I have quite a lot of topics in my cluster. Some useful > > > > information is as follows. > > > > > > > > *my replica lag config for default:* > > > > > > > > replica.lag.time.max.ms=10000 > > > > replica.lag.max.messages=4000 > > > > > > > > *controller info:* > > > > > > > > [zk: localhost:2181(CONNECTED) 4] get /kafka08/controller > > > > {"version":1,"brokerid":29,"timestamp":"1416608404008"} > > > > cZxid = 0x5a4c85923 > > > > ctime = Sat Nov 22 06:20:04 CST 2014 > > > > mZxid = 0x5a4c85923 > > > > mtime = Sat Nov 22 06:20:04 CST 2014 > > > > pZxid = 0x5a4c85923 > > > > cversion = 0 > > > > dataVersion = 0 > > > > aclVersion = 0 > > > > ephemeralOwner = 0x5477ba622cb6c7d > > > > dataLength = 55 > > > > numChildren = 0 > > > > > > > > > > > > *topic info:* > > > > > > > > Topic:org.nginx PartitionCount:48 ReplicationFactor:2 > > Configs: > > > > Topic: org.nginx Partition: 0 Leader: 17 > > Replicas: > > > > 17,32 Isr: 17,32 > > > > Topic: org.nginx Partition: 1 Leader: 18 > > Replicas: > > > > 18,33 Isr: 18,33 > > > > Topic: org.nginx Partition: 2 Leader: 19 > > Replicas: > > > > 19,34 Isr: 34,19 > > > > Topic: org.nginx Partition: 3 Leader: 20 > > Replicas: > > > > 20,35 Isr: 35,20 > > > > Topic: org.nginx Partition: 4 Leader: 21 > > Replicas: > > > > 21,36 Isr: 21,36 > > > > Topic: org.nginx Partition: 5 Leader: 22 > > Replicas: > > > > 22,17 Isr: 17,22 > > > > Topic: org.nginx Partition: 6 Leader: 23 > > Replicas: > > > > 23,18 Isr: 18,23 > > > > Topic: org.nginx Partition: 7 Leader: 24 > > Replicas: > > > > 24,19 Isr: 24,19 > > > > Topic: org.nginx Partition: 8 Leader: 25 > > Replicas: > > > > 25,20 Isr: 25,20 > > > > Topic: org.nginx Partition: 9 Leader: 26 > > Replicas: > > > > 26,21 Isr: 26,21 > > > > Topic: org.nginx Partition: 10 Leader: 27 > > Replicas: > > > > 27,22 Isr: 27,22 > > > > Topic: org.nginx Partition: 11 Leader: 28 > > Replicas: > > > > 28,23 Isr: 28,23 > > > > Topic: org.nginx Partition: 12 Leader: 29 > > Replicas: > > > > 29,24 Isr: 29 > > > > Topic: org.nginx Partition: 13 Leader: 30 > > Replicas: > > > > 30,25 Isr: 30,25 > > > > Topic: org.nginx Partition: 14 Leader: 31 > > Replicas: > > > > 31,26 Isr: 26,31 > > > > Topic: org.nginx Partition: 15 Leader: 32 > > Replicas: > > > > 32,27 Isr: 27,32 > > > > Topic: org.nginx Partition: 16 Leader: 33 > > Replicas: > > > > 33,28 Isr: 33,28 > > > > Topic: org.nginx Partition: 17 Leader: 34 > > Replicas: > > > > 34,29 Isr: 29,34 > > > > Topic: org.nginx Partition: 18 Leader: 35 > > Replicas: > > > > 35,30 Isr: 30,35 > > > > Topic: org.nginx Partition: 19 Leader: 36 > > Replicas: > > > > 36,31 Isr: 31,36 > > > > Topic: org.nginx Partition: 20 Leader: 17 > > Replicas: > > > > 17,32 Isr: 17,32 > > > > Topic: org.nginx Partition: 21 Leader: 18 > > Replicas: > > > > 18,33 Isr: 18,33 > > > > Topic: org.nginx Partition: 22 Leader: 19 > > Replicas: > > > > 19,34 Isr: 34,19 > > > > Topic: org.nginx Partition: 23 Leader: 20 > > Replicas: > > > > 20,35 Isr: 35,20 > > > > Topic: org.nginx Partition: 24 Leader: 21 > > Replicas: > > > > 21,36 Isr: 21,36 > > > > Topic: org.nginx Partition: 25 Leader: 22 > > Replicas: > > > > 22,17 Isr: 17,22 > > > > Topic: org.nginx Partition: 26 Leader: 23 > > Replicas: > > > > 23,18 Isr: 18,23 > > > > Topic: org.nginx Partition: 27 Leader: 24 > > Replicas: > > > > 24,19 Isr: 24,19 > > > > Topic: org.nginx Partition: 28 Leader: 25 > > Replicas: > > > > 25,20 Isr: 25,20 > > > > Topic: org.nginx Partition: 29 Leader: 26 > > Replicas: > > > > 26,21 Isr: 26,21 > > > > Topic: org.nginx Partition: 30 Leader: 27 > > Replicas: > > > > 27,22 Isr: 27,22 > > > > Topic: org.nginx Partition: 31 Leader: 28 > > Replicas: > > > > 28,23 Isr: 28,23 > > > > Topic: org.nginx Partition: 32 Leader: 29 > > Replicas: > > > > 29,24 Isr: 29 > > > > Topic: org.nginx Partition: 33 Leader: 30 > > Replicas: > > > > 30,25 Isr: 30,25 > > > > Topic: org.nginx Partition: 34 Leader: 31 > > Replicas: > > > > 31,26 Isr: 26,31 > > > > Topic: org.nginx Partition: 35 Leader: 32 > > Replicas: > > > > 32,27 Isr: 27,32 > > > > Topic: org.nginx Partition: 36 Leader: 33 > > Replicas: > > > > 33,28 Isr: 33,28 > > > > Topic: org.nginx Partition: 37 Leader: 34 > > Replicas: > > > > 34,29 Isr: 29,34 > > > > Topic: org.nginx Partition: 38 Leader: 35 > > Replicas: > > > > 35,30 Isr: 30,35 > > > > Topic: org.nginx Partition: 39 Leader: 36 > > Replicas: > > > > 36,31 Isr: 31,36 > > > > Topic: org.nginx Partition: 40 Leader: 17 > > Replicas: > > > > 17,32 Isr: 17,32 > > > > Topic: org.nginx Partition: 41 Leader: 18 > > Replicas: > > > > 18,33 Isr: 33,18 > > > > Topic: org.nginx Partition: 42 Leader: 19 > > Replicas: > > > > 19,34 Isr: 34,19 > > > > Topic: org.nginx Partition: 43 Leader: 20 > > Replicas: > > > > 20,35 Isr: 35,20 > > > > Topic: org.nginx Partition: 44 Leader: 21 > > Replicas: > > > > 21,36 Isr: 21,36 > > > > Topic: org.nginx Partition: 45 Leader: 22 > > Replicas: > > > > 22,17 Isr: 17,22 > > > > Topic: org.nginx Partition: 46 Leader: 23 > > Replicas: > > > > 23,18 Isr: 18,23 > > > > Topic: org.nginx Partition: 47 Leader: 24 > > Replicas: > > > > 24,19 Isr: 24,19 > > > > > > > > -- > > > > have a good day! > > > > chenshang'an > > > > > > > > > > > > > > > -- > > have a good day! > > chenshang'an > > > -- have a good day! chenshang'an