[ 
https://issues.apache.org/jira/browse/KAFKA-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743840#comment-15743840
 ] 

Michał Woś edited comment on KAFKA-1407 at 12/13/16 2:27 AM:
-------------------------------------------------------------

[~dmitrybugaychenko] +1
The same issue in 0.8.2.0. Scenario
1. Broker looses session from zookeeper (because some reasons - nvm here)
2. Brokers and zookeeper establish new session
3. We get "Kafka scheduler has not been started" and loose messages from 
producer due "NotLeaderForPartitionException" on its side
There are many "Deleting index" before, after... Not sure if that matters.
Kafka Broker logs:
{noformat}
2016-12-10 10:28:22,924 INFO org.apache.zookeeper.ClientCnxn: Unable to 
reconnect to ZooKeeper service, session 0x25453ff793e91ed has expired, closing 
socket connection
2016-12-10 10:28:22,924 INFO org.I0Itec.zkclient.ZkClient: zookeeper state 
changed (Expired)
2016-12-10 10:28:22,924 INFO org.apache.zookeeper.ZooKeeper: Initiating client 
connection, connectString=machine_address2, machine_address3, machine_address4 
sessionTimeout=15000 watcher=org.I0Itec.zkclient.ZkClient@145eaa29
2016-12-10 10:28:22,936 INFO org.apache.zookeeper.ClientCnxn: EventThread shut 
down
2016-12-10 10:28:22,938 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
connection to server machine_address1. Will not attempt to authenticate using 
SASL (unknown error)
2016-12-10 10:28:22,940 INFO org.apache.zookeeper.ClientCnxn: Socket connection 
established to machine_address1, initiating session
2016-12-10 10:28:22,947 INFO org.apache.zookeeper.ClientCnxn: Session 
establishment complete on server machine_address1, sessionid = 
0x65453fead67efb4, negotiated timeout = 15000
2016-12-10 10:28:22,947 INFO org.I0Itec.zkclient.ZkClient: zookeeper state 
changed (SyncConnected)

2016-12-10 10:28:23,014 INFO kafka.log.OffsetIndex: Deleting index 
/home/data/kafka/topic_name-0/00000000034872197725.index.deleted
2016-12-10 10:28:23,040 INFO kafka.log.OffsetIndex: Deleting index 
/home/data/kafka/topic_name-0/00000000034871445334.index.deleted
2016-12-10 10:28:23,183 INFO kafka.log.OffsetIndex: Deleting index 
/home/data/kafka/topic_name-0/00000000034882006637.index.deleted
2016-12-10 10:28:23,188 INFO kafka.log.OffsetIndex: Deleting index 
/home/data/kafka/topic_name-0/00000000034875965334.index.deleted

2016-12-10 10:28:23,277 INFO 
kafka.controller.KafkaController$SessionExpirationListener: 
[SessionExpirationListener on 2110], ZK expired; shut down all controller 
components and try to re-elect
2016-12-10 10:28:23,319 ERROR org.I0Itec.zkclient.ZkEventThread: Error handling 
event ZkEvent[New session event sent to 
kafka.controller.KafkaController$SessionExpirationListener@70ee82ce]
java.lang.IllegalStateException: Kafka scheduler has not been started
        at kafka.utils.KafkaScheduler.ensureStarted(KafkaScheduler.scala:114)
        at kafka.utils.KafkaScheduler.shutdown(KafkaScheduler.scala:86)
        at 
kafka.controller.KafkaController.onControllerResignation(KafkaController.scala:350)
        at 
kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSessionapply$mcZ$sp(KafkaController.scala:1108)
        at 
kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSessionapply(KafkaController.scala:1107)
        at 
kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSessionapply(KafkaController.scala:1107)
        at kafka.utils.Utils$.inLock(Utils.scala:561)
        at 
kafka.controller.KafkaController$SessionExpirationListener.handleNewSession(KafkaController.scala:1107)
        at org.I0Itec.zkclient.ZkClientrun(ZkClient.java:472)
        at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
2016-12-10 10:28:23,320 INFO kafka.server.KafkaHealthcheck: re-registering 
broker info in ZK for broker 2110
2016-12-10 10:28:23,367 INFO kafka.utils.ZkUtils$: Registered broker 2110 at 
path /brokers/ids/2110 with address machine_address.
2016-12-10 10:28:23,370 INFO kafka.server.KafkaHealthcheck: done re-registering 
broker
2016-12-10 10:28:23,380 INFO kafka.server.KafkaHealthcheck: Subscribing to 
/brokers/topics path to watch for new topics
2016-12-10 10:28:23,456 INFO 
kafka.server.ZookeeperLeaderElector$LeaderChangeListener: New leader is 828
2016-12-10 10:28:24,820 INFO kafka.log.Log: Rolled new log segment for 
'Dispatcher.client_calls-0' in 1 ms.
{noformat}
Meantime, after session is established, my producer gets for each messege:
{noformat}
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
not the leader for that topic-partition
{noformat}
and messages are lost.



was (Author: wosiu):
[~dmitrybugaychenko] +1
The same issue in 0.8.2.0. Scenario
1. Broker looses session from zookeeper (because some reasons - nvm here)
2. Brokers and zookeeper establish new session
3. We get "Kafka scheduler has not been started" and loose messages from 
producer due "NotLeaderForPartitionException" on its side
Kafka Broker logs:
{noformat}
2016-12-10 10:28:22,924 INFO org.apache.zookeeper.ClientCnxn: Unable to 
reconnect to ZooKeeper service, session 0x25453ff793e91ed has expired, closing 
socket connection
2016-12-10 10:28:22,924 INFO org.I0Itec.zkclient.ZkClient: zookeeper state 
changed (Expired)
2016-12-10 10:28:22,924 INFO org.apache.zookeeper.ZooKeeper: Initiating client 
connection, connectString=machine_address2, machine_address3, machine_address4 
sessionTimeout=15000 watcher=org.I0Itec.zkclient.ZkClient@145eaa29
2016-12-10 10:28:22,936 INFO org.apache.zookeeper.ClientCnxn: EventThread shut 
down
2016-12-10 10:28:22,938 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
connection to server machine_address1. Will not attempt to authenticate using 
SASL (unknown error)
2016-12-10 10:28:22,940 INFO org.apache.zookeeper.ClientCnxn: Socket connection 
established to machine_address1, initiating session
2016-12-10 10:28:22,947 INFO org.apache.zookeeper.ClientCnxn: Session 
establishment complete on server machine_address1, sessionid = 
0x65453fead67efb4, negotiated timeout = 15000
2016-12-10 10:28:22,947 INFO org.I0Itec.zkclient.ZkClient: zookeeper state 
changed (SyncConnected)

2016-12-10 10:28:23,014 INFO kafka.log.OffsetIndex: Deleting index 
/home/data/kafka/topic_name-0/00000000034872197725.index.deleted
2016-12-10 10:28:23,040 INFO kafka.log.OffsetIndex: Deleting index 
/home/data/kafka/topic_name-0/00000000034871445334.index.deleted
2016-12-10 10:28:23,183 INFO kafka.log.OffsetIndex: Deleting index 
/home/data/kafka/topic_name-0/00000000034882006637.index.deleted
2016-12-10 10:28:23,188 INFO kafka.log.OffsetIndex: Deleting index 
/home/data/kafka/topic_name-0/00000000034875965334.index.deleted

2016-12-10 10:28:23,277 INFO 
kafka.controller.KafkaController$SessionExpirationListener: 
[SessionExpirationListener on 2110], ZK expired; shut down all controller 
components and try to re-elect
2016-12-10 10:28:23,319 ERROR org.I0Itec.zkclient.ZkEventThread: Error handling 
event ZkEvent[New session event sent to 
kafka.controller.KafkaController$SessionExpirationListener@70ee82ce]
java.lang.IllegalStateException: Kafka scheduler has not been started
        at kafka.utils.KafkaScheduler.ensureStarted(KafkaScheduler.scala:114)
        at kafka.utils.KafkaScheduler.shutdown(KafkaScheduler.scala:86)
        at 
kafka.controller.KafkaController.onControllerResignation(KafkaController.scala:350)
        at 
kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSessionapply$mcZ$sp(KafkaController.scala:1108)
        at 
kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSessionapply(KafkaController.scala:1107)
        at 
kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSessionapply(KafkaController.scala:1107)
        at kafka.utils.Utils$.inLock(Utils.scala:561)
        at 
kafka.controller.KafkaController$SessionExpirationListener.handleNewSession(KafkaController.scala:1107)
        at org.I0Itec.zkclient.ZkClientrun(ZkClient.java:472)
        at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
2016-12-10 10:28:23,320 INFO kafka.server.KafkaHealthcheck: re-registering 
broker info in ZK for broker 2110
2016-12-10 10:28:23,367 INFO kafka.utils.ZkUtils$: Registered broker 2110 at 
path /brokers/ids/2110 with address machine_address.
2016-12-10 10:28:23,370 INFO kafka.server.KafkaHealthcheck: done re-registering 
broker
2016-12-10 10:28:23,380 INFO kafka.server.KafkaHealthcheck: Subscribing to 
/brokers/topics path to watch for new topics
2016-12-10 10:28:23,456 INFO 
kafka.server.ZookeeperLeaderElector$LeaderChangeListener: New leader is 828
2016-12-10 10:28:24,820 INFO kafka.log.Log: Rolled new log segment for 
'Dispatcher.client_calls-0' in 1 ms.
{noformat}
Meantime, after session is established, my producer gets for each messege:
{noformat}
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
not the leader for that topic-partition
{noformat}
and messages are lost.


> Broker can not return to ISR because of BadVersionException
> -----------------------------------------------------------
>
>                 Key: KAFKA-1407
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1407
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 0.8.1
>            Reporter: Dmitry Bugaychenko
>            Assignee: Neha Narkhede
>
> Each morning we found a broker out of ISR at stuck with log full of messages:
> {code}
> INFO   | jvm 1    | 2014/04/21 08:36:21 | [2014-04-21 09:36:21,907] ERROR 
> Conditional update of path /brokers/topics/topic2/partitions/1/state with 
> data 
> {"controller_epoch":46,"leader":2,"version":1,"leader_epoch":38,"isr":[2]} 
> and expected version 53 failed due to 
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = 
> BadVersion for /brokers/topics/topic2/partitions/1/state 
> (kafka.utils.ZkUtils$)
> INFO   | jvm 1    | 2014/04/21 08:36:21 | [2014-04-21 09:36:21,907] INFO 
> Partition [topic2,1] on broker 2: Cached zkVersion [53] not equal to that in 
> zookeeper, skip updating ISR (kafka.cluster.Partition)
> {code}
> It seems that it can not recover after short netwrok break down and the only 
> way to return it is restart it using kill -9.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to