[ https://issues.apache.org/jira/browse/KAFKA-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743840#comment-15743840 ]
Michał Woś edited comment on KAFKA-1407 at 12/13/16 2:26 AM: ------------------------------------------------------------- [~dmitrybugaychenko] +1 The same issue in 0.8.2.0. Scenario 1. Broker looses session from zookeeper (because some reasons - nvm here) 2. Brokers and zookeeper establish new session 3. We get "Kafka scheduler has not been started" and loose messages from producer due "NotLeaderForPartitionException" on its side Kafka Broker logs: {noformat} 2016-12-10 10:28:22,924 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x25453ff793e91ed has expired, closing socket connection 2016-12-10 10:28:22,924 INFO org.I0Itec.zkclient.ZkClient: zookeeper state changed (Expired) 2016-12-10 10:28:22,924 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=machine_address2, machine_address3, machine_address4 sessionTimeout=15000 watcher=org.I0Itec.zkclient.ZkClient@145eaa29 2016-12-10 10:28:22,936 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2016-12-10 10:28:22,938 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server machine_address1. Will not attempt to authenticate using SASL (unknown error) 2016-12-10 10:28:22,940 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to machine_address1, initiating session 2016-12-10 10:28:22,947 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server machine_address1, sessionid = 0x65453fead67efb4, negotiated timeout = 15000 2016-12-10 10:28:22,947 INFO org.I0Itec.zkclient.ZkClient: zookeeper state changed (SyncConnected) 2016-12-10 10:28:23,014 INFO kafka.log.OffsetIndex: Deleting index /home/data/kafka/topic_name-0/00000000034872197725.index.deleted 2016-12-10 10:28:23,040 INFO kafka.log.OffsetIndex: Deleting index /home/data/kafka/topic_name-0/00000000034871445334.index.deleted 2016-12-10 10:28:23,183 INFO kafka.log.OffsetIndex: Deleting index /home/data/kafka/topic_name-0/00000000034882006637.index.deleted 2016-12-10 10:28:23,188 INFO kafka.log.OffsetIndex: Deleting index /home/data/kafka/topic_name-0/00000000034875965334.index.deleted 2016-12-10 10:28:23,277 INFO kafka.controller.KafkaController$SessionExpirationListener: [SessionExpirationListener on 2110], ZK expired; shut down all controller components and try to re-elect 2016-12-10 10:28:23,319 ERROR org.I0Itec.zkclient.ZkEventThread: Error handling event ZkEvent[New session event sent to kafka.controller.KafkaController$SessionExpirationListener@70ee82ce] java.lang.IllegalStateException: Kafka scheduler has not been started at kafka.utils.KafkaScheduler.ensureStarted(KafkaScheduler.scala:114) at kafka.utils.KafkaScheduler.shutdown(KafkaScheduler.scala:86) at kafka.controller.KafkaController.onControllerResignation(KafkaController.scala:350) at kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSessionapply$mcZ$sp(KafkaController.scala:1108) at kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSessionapply(KafkaController.scala:1107) at kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSessionapply(KafkaController.scala:1107) at kafka.utils.Utils$.inLock(Utils.scala:561) at kafka.controller.KafkaController$SessionExpirationListener.handleNewSession(KafkaController.scala:1107) at org.I0Itec.zkclient.ZkClientrun(ZkClient.java:472) at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) 2016-12-10 10:28:23,320 INFO kafka.server.KafkaHealthcheck: re-registering broker info in ZK for broker 2110 2016-12-10 10:28:23,367 INFO kafka.utils.ZkUtils$: Registered broker 2110 at path /brokers/ids/2110 with address machine_address. 2016-12-10 10:28:23,370 INFO kafka.server.KafkaHealthcheck: done re-registering broker 2016-12-10 10:28:23,380 INFO kafka.server.KafkaHealthcheck: Subscribing to /brokers/topics path to watch for new topics 2016-12-10 10:28:23,456 INFO kafka.server.ZookeeperLeaderElector$LeaderChangeListener: New leader is 828 2016-12-10 10:28:24,820 INFO kafka.log.Log: Rolled new log segment for 'Dispatcher.client_calls-0' in 1 ms. {noformat} Meantime, after session is established, my producer gets for each messege: {noformat} org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition {noformat} and messages are lost. was (Author: wosiu): [~dmitrybugaychenko] +1 The same issue in 0.8.2.0. Scenario 1. Broker looses session from zookeeper (because some reasons - nvm here) 2. Brokers and zookeeper establish new session 3. We get "Kafka scheduler has not been started" and loose messages from producer due "NotLeaderException" {noformat} 2016-12-10 10:28:22,924 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x25453ff793e91ed has expired, closing socket connection 2016-12-10 10:28:22,924 INFO org.I0Itec.zkclient.ZkClient: zookeeper state changed (Expired) 2016-12-10 10:28:22,924 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=machine_address2, machine_address3, machine_address4 sessionTimeout=15000 watcher=org.I0Itec.zkclient.ZkClient@145eaa29 2016-12-10 10:28:22,936 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2016-12-10 10:28:22,938 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server machine_address1. Will not attempt to authenticate using SASL (unknown error) 2016-12-10 10:28:22,940 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to machine_address1, initiating session 2016-12-10 10:28:22,947 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server machine_address1, sessionid = 0x65453fead67efb4, negotiated timeout = 15000 2016-12-10 10:28:22,947 INFO org.I0Itec.zkclient.ZkClient: zookeeper state changed (SyncConnected) 2016-12-10 10:28:23,014 INFO kafka.log.OffsetIndex: Deleting index /home/data/kafka/topic_name-0/00000000034872197725.index.deleted 2016-12-10 10:28:23,040 INFO kafka.log.OffsetIndex: Deleting index /home/data/kafka/topic_name-0/00000000034871445334.index.deleted 2016-12-10 10:28:23,183 INFO kafka.log.OffsetIndex: Deleting index /home/data/kafka/topic_name-0/00000000034882006637.index.deleted 2016-12-10 10:28:23,188 INFO kafka.log.OffsetIndex: Deleting index /home/data/kafka/topic_name-0/00000000034875965334.index.deleted 2016-12-10 10:28:23,277 INFO kafka.controller.KafkaController$SessionExpirationListener: [SessionExpirationListener on 2110], ZK expired; shut down all controller components and try to re-elect 2016-12-10 10:28:23,319 ERROR org.I0Itec.zkclient.ZkEventThread: Error handling event ZkEvent[New session event sent to kafka.controller.KafkaController$SessionExpirationListener@70ee82ce] java.lang.IllegalStateException: Kafka scheduler has not been started at kafka.utils.KafkaScheduler.ensureStarted(KafkaScheduler.scala:114) at kafka.utils.KafkaScheduler.shutdown(KafkaScheduler.scala:86) at kafka.controller.KafkaController.onControllerResignation(KafkaController.scala:350) at kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSessionapply$mcZ$sp(KafkaController.scala:1108) at kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSessionapply(KafkaController.scala:1107) at kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSessionapply(KafkaController.scala:1107) at kafka.utils.Utils$.inLock(Utils.scala:561) at kafka.controller.KafkaController$SessionExpirationListener.handleNewSession(KafkaController.scala:1107) at org.I0Itec.zkclient.ZkClientrun(ZkClient.java:472) at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) 2016-12-10 10:28:23,320 INFO kafka.server.KafkaHealthcheck: re-registering broker info in ZK for broker 2110 2016-12-10 10:28:23,367 INFO kafka.utils.ZkUtils$: Registered broker 2110 at path /brokers/ids/2110 with address machine_address. 2016-12-10 10:28:23,370 INFO kafka.server.KafkaHealthcheck: done re-registering broker 2016-12-10 10:28:23,380 INFO kafka.server.KafkaHealthcheck: Subscribing to /brokers/topics path to watch for new topics 2016-12-10 10:28:23,456 INFO kafka.server.ZookeeperLeaderElector$LeaderChangeListener: New leader is 828 2016-12-10 10:28:24,820 INFO kafka.log.Log: Rolled new log segment for 'Dispatcher.client_calls-0' in 1 ms. {noformat} Meantime, after session is established, my producer gets for each messege: {noformat} org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition {noformat} and messages are lost. > Broker can not return to ISR because of BadVersionException > ----------------------------------------------------------- > > Key: KAFKA-1407 > URL: https://issues.apache.org/jira/browse/KAFKA-1407 > Project: Kafka > Issue Type: Bug > Components: controller > Affects Versions: 0.8.1 > Reporter: Dmitry Bugaychenko > Assignee: Neha Narkhede > > Each morning we found a broker out of ISR at stuck with log full of messages: > {code} > INFO | jvm 1 | 2014/04/21 08:36:21 | [2014-04-21 09:36:21,907] ERROR > Conditional update of path /brokers/topics/topic2/partitions/1/state with > data > {"controller_epoch":46,"leader":2,"version":1,"leader_epoch":38,"isr":[2]} > and expected version 53 failed due to > org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = > BadVersion for /brokers/topics/topic2/partitions/1/state > (kafka.utils.ZkUtils$) > INFO | jvm 1 | 2014/04/21 08:36:21 | [2014-04-21 09:36:21,907] INFO > Partition [topic2,1] on broker 2: Cached zkVersion [53] not equal to that in > zookeeper, skip updating ISR (kafka.cluster.Partition) > {code} > It seems that it can not recover after short netwrok break down and the only > way to return it is restart it using kill -9. -- This message was sent by Atlassian JIRA (v6.3.4#6332)