[ https://issues.apache.org/jira/browse/KAFKA-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743840#comment-15743840 ]
Michał Woś edited comment on KAFKA-1407 at 12/13/16 2:25 AM: ------------------------------------------------------------- [~dmitrybugaychenko] +1 The same issue in 0.8.2.0. Scenario 1. Broker looses session from zookeeper (because some reasons - nvm here) 2. Brokers and zookeeper establish new session 3. We get "Kafka scheduler has not been started" and loose messages from producer due "NotLeaderException" {noformat} 2016-12-10 10:28:22,924 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x25453ff793e91ed has expired, closing socket connection 2016-12-10 10:28:22,924 INFO org.I0Itec.zkclient.ZkClient: zookeeper state changed (Expired) 2016-12-10 10:28:22,924 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=machine_address2, machine_address3, machine_address4 sessionTimeout=15000 watcher=org.I0Itec.zkclient.ZkClient@145eaa29 2016-12-10 10:28:22,936 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2016-12-10 10:28:22,938 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server machine_address1. Will not attempt to authenticate using SASL (unknown error) 2016-12-10 10:28:22,940 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to machine_address1, initiating session 2016-12-10 10:28:22,947 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server machine_address1, sessionid = 0x65453fead67efb4, negotiated timeout = 15000 2016-12-10 10:28:22,947 INFO org.I0Itec.zkclient.ZkClient: zookeeper state changed (SyncConnected) 2016-12-10 10:28:23,014 INFO kafka.log.OffsetIndex: Deleting index /home/data/kafka/topic_name-0/00000000034872197725.index.deleted 2016-12-10 10:28:23,040 INFO kafka.log.OffsetIndex: Deleting index /home/data/kafka/topic_name-0/00000000034871445334.index.deleted 2016-12-10 10:28:23,183 INFO kafka.log.OffsetIndex: Deleting index /home/data/kafka/topic_name-0/00000000034882006637.index.deleted 2016-12-10 10:28:23,188 INFO kafka.log.OffsetIndex: Deleting index /home/data/kafka/topic_name-0/00000000034875965334.index.deleted 2016-12-10 10:28:23,277 INFO kafka.controller.KafkaController$SessionExpirationListener: [SessionExpirationListener on 2110], ZK expired; shut down all controller components and try to re-elect 2016-12-10 10:28:23,319 ERROR org.I0Itec.zkclient.ZkEventThread: Error handling event ZkEvent[New session event sent to kafka.controller.KafkaController$SessionExpirationListener@70ee82ce] java.lang.IllegalStateException: Kafka scheduler has not been started at kafka.utils.KafkaScheduler.ensureStarted(KafkaScheduler.scala:114) at kafka.utils.KafkaScheduler.shutdown(KafkaScheduler.scala:86) at kafka.controller.KafkaController.onControllerResignation(KafkaController.scala:350) at kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSessionapply$mcZ$sp(KafkaController.scala:1108) at kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSessionapply(KafkaController.scala:1107) at kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSessionapply(KafkaController.scala:1107) at kafka.utils.Utils$.inLock(Utils.scala:561) at kafka.controller.KafkaController$SessionExpirationListener.handleNewSession(KafkaController.scala:1107) at org.I0Itec.zkclient.ZkClientrun(ZkClient.java:472) at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) 2016-12-10 10:28:23,320 INFO kafka.server.KafkaHealthcheck: re-registering broker info in ZK for broker 2110 2016-12-10 10:28:23,367 INFO kafka.utils.ZkUtils$: Registered broker 2110 at path /brokers/ids/2110 with address machine_address. 2016-12-10 10:28:23,370 INFO kafka.server.KafkaHealthcheck: done re-registering broker 2016-12-10 10:28:23,380 INFO kafka.server.KafkaHealthcheck: Subscribing to /brokers/topics path to watch for new topics 2016-12-10 10:28:23,456 INFO kafka.server.ZookeeperLeaderElector$LeaderChangeListener: New leader is 828 2016-12-10 10:28:24,820 INFO kafka.log.Log: Rolled new log segment for 'Dispatcher.client_calls-0' in 1 ms. {noformat} Meantime, after session is established, my producer gets for each messege: {noformat} org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition {noformat} and messages are lost. was (Author: wosiu): [~dmitrybugaychenko] +1 The same issue in 0.8.2.0. Scenario 1. Broker looses session from zookeeper (because some reasons - nvm here) 2. Brokers and zookeeper establish new session and this happens: {noformat} 2016-12-10 10:28:22,924 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x25453ff793e91ed has expired, closing socket connection 2016-12-10 10:28:22,924 INFO org.I0Itec.zkclient.ZkClient: zookeeper state changed (Expired) 2016-12-10 10:28:22,924 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=machine_address2, machine_address3, machine_address4 sessionTimeout=15000 watcher=org.I0Itec.zkclient.ZkClient@145eaa29 2016-12-10 10:28:22,936 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2016-12-10 10:28:22,938 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server machine_address1. Will not attempt to authenticate using SASL (unknown error) 2016-12-10 10:28:22,940 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to machine_address1, initiating session 2016-12-10 10:28:22,947 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server machine_address1, sessionid = 0x65453fead67efb4, negotiated timeout = 15000 2016-12-10 10:28:22,947 INFO org.I0Itec.zkclient.ZkClient: zookeeper state changed (SyncConnected) 2016-12-10 10:28:23,014 INFO kafka.log.OffsetIndex: Deleting index /home/data/kafka/topic_name-0/00000000034872197725.index.deleted 2016-12-10 10:28:23,040 INFO kafka.log.OffsetIndex: Deleting index /home/data/kafka/topic_name-0/00000000034871445334.index.deleted 2016-12-10 10:28:23,183 INFO kafka.log.OffsetIndex: Deleting index /home/data/kafka/topic_name-0/00000000034882006637.index.deleted 2016-12-10 10:28:23,188 INFO kafka.log.OffsetIndex: Deleting index /home/data/kafka/topic_name-0/00000000034875965334.index.deleted 2016-12-10 10:28:23,277 INFO kafka.controller.KafkaController$SessionExpirationListener: [SessionExpirationListener on 2110], ZK expired; shut down all controller components and try to re-elect 2016-12-10 10:28:23,319 ERROR org.I0Itec.zkclient.ZkEventThread: Error handling event ZkEvent[New session event sent to kafka.controller.KafkaController$SessionExpirationListener@70ee82ce] java.lang.IllegalStateException: Kafka scheduler has not been started at kafka.utils.KafkaScheduler.ensureStarted(KafkaScheduler.scala:114) at kafka.utils.KafkaScheduler.shutdown(KafkaScheduler.scala:86) at kafka.controller.KafkaController.onControllerResignation(KafkaController.scala:350) at kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSessionapply$mcZ$sp(KafkaController.scala:1108) at kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSessionapply(KafkaController.scala:1107) at kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSessionapply(KafkaController.scala:1107) at kafka.utils.Utils$.inLock(Utils.scala:561) at kafka.controller.KafkaController$SessionExpirationListener.handleNewSession(KafkaController.scala:1107) at org.I0Itec.zkclient.ZkClientrun(ZkClient.java:472) at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) 2016-12-10 10:28:23,320 INFO kafka.server.KafkaHealthcheck: re-registering broker info in ZK for broker 2110 2016-12-10 10:28:23,367 INFO kafka.utils.ZkUtils$: Registered broker 2110 at path /brokers/ids/2110 with address machine_address. 2016-12-10 10:28:23,370 INFO kafka.server.KafkaHealthcheck: done re-registering broker 2016-12-10 10:28:23,380 INFO kafka.server.KafkaHealthcheck: Subscribing to /brokers/topics path to watch for new topics 2016-12-10 10:28:23,456 INFO kafka.server.ZookeeperLeaderElector$LeaderChangeListener: New leader is 828 2016-12-10 10:28:24,820 INFO kafka.log.Log: Rolled new log segment for 'Dispatcher.client_calls-0' in 1 ms. {noformat} Meantime, after session is established, my producer gets for each messege: {noformat} org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition {noformat} and messages are lost. > Broker can not return to ISR because of BadVersionException > ----------------------------------------------------------- > > Key: KAFKA-1407 > URL: https://issues.apache.org/jira/browse/KAFKA-1407 > Project: Kafka > Issue Type: Bug > Components: controller > Affects Versions: 0.8.1 > Reporter: Dmitry Bugaychenko > Assignee: Neha Narkhede > > Each morning we found a broker out of ISR at stuck with log full of messages: > {code} > INFO | jvm 1 | 2014/04/21 08:36:21 | [2014-04-21 09:36:21,907] ERROR > Conditional update of path /brokers/topics/topic2/partitions/1/state with > data > {"controller_epoch":46,"leader":2,"version":1,"leader_epoch":38,"isr":[2]} > and expected version 53 failed due to > org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = > BadVersion for /brokers/topics/topic2/partitions/1/state > (kafka.utils.ZkUtils$) > INFO | jvm 1 | 2014/04/21 08:36:21 | [2014-04-21 09:36:21,907] INFO > Partition [topic2,1] on broker 2: Cached zkVersion [53] not equal to that in > zookeeper, skip updating ISR (kafka.cluster.Partition) > {code} > It seems that it can not recover after short netwrok break down and the only > way to return it is restart it using kill -9. -- This message was sent by Atlassian JIRA (v6.3.4#6332)