[ https://issues.apache.org/jira/browse/KAFKA-6281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anil reopened KAFKA-6281: ------------------------- Thanks [~omkreddy], We've got this issue again today, twice. I see that both the times, there is GC running in my server1 , which I'm connecting to. But it seems to be of short duration. Please suggest, if that could be the cause. Regarding the network connection issue(s), b/w which components there could be issue ? I mean the zoo-keeper of server1, couldnot connect to zoo-keeper of server2 ? . We haven't found any connectivity issues, but still want to make sure on the ports and availablitiy. {code:java} Exception 1 : -2018-06-04T05:03:35,967 [kafka-producer-network-thread | producer-1] ERROR -org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. GC 1: 2018-06-04T05:03:24.865-0400: 1465075.225: [GC pause (G1 Evacuation Pause) (young), 0.0158735 secs] [Parallel Time: 15.3 ms, GC Workers: 2] [GC Worker Start (ms): Min: 1465075225.5, Avg: 1465075225.5, Max: 1465075225.5, Diff: 0.0] [Ext Root Scanning (ms): Min: 0.7, Avg: 0.7, Max: 0.7, Diff: 0.1, Sum: 1.4] [Update RS (ms): Min: 2.1, Avg: 2.3, Max: 2.4, Diff: 0.2, Sum: 4.5] [Processed Buffers: Min: 19, Avg: 20.5, Max: 22, Diff: 3, Sum: 41] [Scan RS (ms): Min: 3.6, Avg: 3.7, Max: 3.9, Diff: 0.2, Sum: 7.5] [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Object Copy (ms): Min: 8.5, Avg: 8.5, Max: 8.6, Diff: 0.1, Sum: 17.0] [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [GC Worker Total (ms): Min: 15.2, Avg: 15.2, Max: 15.2, Diff: 0.0, Sum: 30.4] [GC Worker End (ms): Min: 1465075240.7, Avg: 1465075240.7, Max: 1465075240.7, Diff: 0.0] [Code Root Fixup: 0.1 ms] [Code Root Purge: 0.0 ms] [Clear CT: 0.1 ms] [Other: 0.5 ms] [Choose CSet: 0.0 ms] [Ref Proc: 0.1 ms] [Ref Enq: 0.0 ms] [Redirty Cards: 0.1 ms] [Humongous Register: 0.0 ms] [Humongous Reclaim: 0.0 ms] [Free CSet: 0.1 ms] [Eden: 46.0M(46.0M)->0.0B(100.0M) Survivors: 5120.0K->6144.0K Heap: 262.0M(1024.0M)->217.0M(1024.0M)] [Times: user=0.03 sys=0.00, real=0.02 secs] --Exception 2 -- -2018-06-04T10:15:15,522 [kafka-producer-network-thread | producer-1] ERROR -org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. --GC--2 2018-05-18T10:15:11.501-0400: 14981.861: [GC pause (G1 Evacuation Pause) (young), 0.0205628 secs] [Parallel Time: 20.0 ms, GC Workers: 2] [GC Worker Start (ms): Min: 14981861.1, Avg: 14981861.1, Max: 14981861.1, Diff: 0.0] [Ext Root Scanning (ms): Min: 0.6, Avg: 0.7, Max: 0.7, Diff: 0.0, Sum: 1.3] [Update RS (ms): Min: 1.6, Avg: 1.6, Max: 1.6, Diff: 0.0, Sum: 3.1] [Processed Buffers: Min: 12, Avg: 18.5, Max: 25, Diff: 13, Sum: 37] [Scan RS (ms): Min: 2.6, Avg: 2.6, Max: 2.6, Diff: 0.0, Sum: 5.2] [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Object Copy (ms): Min: 15.1, Avg: 15.1, Max: 15.1, Diff: 0.0, Sum: 30.2] [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [GC Worker Total (ms): Min: 19.9, Avg: 19.9, Max: 19.9, Diff: 0.0, Sum: 39.8] [GC Worker End (ms): Min: 14981881.0, Avg: 14981881.0, Max: 14981881.0, Diff: 0.0] [Code Root Fixup: 0.0 ms] [Code Root Purge: 0.0 ms] [Clear CT: 0.0 ms] [Other: 0.5 ms] [Choose CSet: 0.0 ms] [Ref Proc: 0.1 ms] [Ref Enq: 0.0 ms] [Redirty Cards: 0.1 ms] [Humongous Register: 0.0 ms] [Humongous Reclaim: 0.0 ms] [Free CSet: 0.1 ms] [Eden: 63.0M(63.0M)->0.0B(79.0M) Survivors: 8192.0K->6144.0K Heap: 425.0M(1024.0M)->364.5M(1024.0M)] [Times: user=0.04 sys=0.00, real=0.02 secs] {code} > Kafka JavaAPI Producer failed with NotLeaderForPartitionException > ----------------------------------------------------------------- > > Key: KAFKA-6281 > URL: https://issues.apache.org/jira/browse/KAFKA-6281 > Project: Kafka > Issue Type: Bug > Reporter: Anil > Priority: Major > Attachments: server1-controller.log, server2-controller.log > > > We are running Kafka (vesion kafka_2.11-0.10.1.0) in a 2 node cluster. We > have 2 producers (Java API) acting on different topics. Each topic has single > partition. The topic where we had this issue, has one consumer running. This > set up has been running fine for 3 months, and we saw this issue. All the > suggested cases/solutions for this issue in other forums don't seem to apply > for my scenario. > Exception at producer; > {code} > -2017-11-25T17:40:33,035 [kafka-producer-network-thread | producer-1] ERROR > client.producer.BingLogProducerCallback - Encountered exception in sending > message ; > org.apache.kafka.common.errors.NotLeaderForPartitionException: > This server is not the leader for that topic-partition. > {code} > We haven't enabled retries for the messages, because this is transactional > data and we want to maintain the order. > Producer config: > {code} > bootstrap.servers : server1ip:9092 > acks :all > retries : 0 > linger.ms :0 > buffer.memory :10240000 > max.request.size :1024000 > key.serializer : org.apache.kafka.common.serialization.StringSerializer > value.serializer : org.apache.kafka.common.serialization.StringSerializer > {code} > We are connecting to server1 at both producer and consumer. The controller > log at server2 indicates there is some shutdown happened at during sametime, > but I dont understand why this happened. > {color:red}[2017-11-25 17:31:44,776] DEBUG [Controller 2]: topics not in > preferred replica Map() (kafka.controller.KafkaController) [2017-11-25 > 17:31:44,776] TRACE [Controller 2]: leader imbalance ratio for broker 2 is > 0.000000 (kafka.controller.KafkaController) [2017-11-25 17:31:44,776] DEBUG > [Controller 2]: topics not in preferred replica Map() > (kafka.controller.KafkaController) [2017-11-25 17:31:44,776] TRACE > [Controller 2]: leader imbalance ratio for broker 1 is 0.000000 > (kafka.controller.KafkaController) [2017-11-25 17:34:18,314] INFO > [SessionExpirationListener on 2], ZK expired; shut down all controller > components and try to re-elect > (kafka.controller.KafkaController$SessionExpirationListener) [2017-11-25 > 17:34:18,317] DEBUG [Controller 2]: Controller resigning, broker id 2 > (kafka.controller.KafkaController) [2017-11-25 17:34:18,317] DEBUG > [Controller 2]: De-registering IsrChangeNotificationListener > (kafka.controller.KafkaController) [2017-11-25 17:34:18,317] INFO > [delete-topics-thread-2], Shutting down > (kafka.controller.TopicDeletionManager$DeleteTopicsThread) [2017-11-25 > 17:34:18,317] INFO [delete-topics-thread-2], Stopped > (kafka.controller.TopicDeletionManager$DeleteTopicsThread) [2017-11-25 > 17:34:18,318] INFO [delete-topics-thread-2], Shutdown completed > (kafka.controller.TopicDeletionManager$DeleteTopicsThread) [2017-11-25 > 17:34:18,318] INFO [Partition state machine on Controller 2]: Stopped > partition state machine (kafka.controller.PartitionStateMachine) [2017-11-25 > 17:34:18,318] INFO [Replica state machine on controller 2]: Stopped replica > state machine (kafka.controller.ReplicaStateMachine) [2017-11-25 > 17:34:18,318] INFO [Controller-2-to-broker-2-send-thread], Shutting down > (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,318] INFO > [Controller-2-to-broker-2-send-thread], Stopped > (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,319] INFO > [Controller-2-to-broker-2-send-thread], Shutdown completed > (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,319] INFO > [Controller-2-to-broker-1-send-thread], Shutting down > (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,319] INFO > [Controller-2-to-broker-1-send-thread], Stopped > (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,319] INFO > [Controller-2-to-broker-1-send-thread], Shutdown completed > (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,319] INFO > [Controller 2]: Broker 2 resigned as the controller > (kafka.controller.KafkaController) [2017-11-25 17:34:18,353] DEBUG > [IsrChangeNotificationListener] Fired!!! > (kafka.controller.IsrChangeNotificationListener) [2017-11-25 17:34:18,353] > DEBUG [IsrChangeNotificationListener] Fired!!! > (kafka.controller.IsrChangeNotificationListener) [2017-11-25 17:34:18,354] > INFO [BrokerChangeListener on Controller 2]: Broker change listener fired for > path /brokers/ids with children 1,2 > (kafka.controller.ReplicaStateMachine$BrokerChangeListener) [2017-11-25 > 17:34:18,355] DEBUG [DeleteTopicsListener on 2]: Delete topics listener fired > for topics to be deleted > (kafka.controller.PartitionStateMachine$DeleteTopicsListener) [2017-11-25 > 17:34:18,362] INFO [AddPartitionsListener on 2]: Partition modification > triggered {"version":1,"partitions":{"0":[1]}} for path /brokers/topics/ESQ > (kafka.controller.PartitionStateMachine$PartitionModificationsListener) > [2017-11-25 17:34:18,368] INFO [AddPartitionsListener on 2]: Partition > modification triggered {"version":1,"partitions":{"0":[1]}} for path > /brokers/topics/Test1 > (kafka.controller.PartitionStateMachine$PartitionModificationsListener) > [2017-11-25 17:34:18,369] INFO [AddPartitionsListener on 2]: Partition > modification triggered {"version":1,"partitions":{"0":[2]}} for path > /brokers/topics/ImageQ > (kafka.controller.PartitionStateMachine$PartitionModificationsListener) > [2017-11-25 17:34:18,374] INFO [AddPartitionsListener on 2]: Partition > modification triggered > {"version":1,"partitions":{"8":[1,2],"4":[1,2],"9":[2,1],"5":[2,1],"6":[1,2],"1":[2,1],"0":[1,2],"2":[1,2],"7":[2,1],"3":[2,1]}} > for path /brokers/topics/NMS_NotifyQ > (kafka.controller.PartitionStateMachine$PartitionModificationsListener) > [2017-11-25 17:34:18,375] INFO [AddPartitionsListener on 2]: Partition > modification triggered {"version":1,"partitions":{"0":[1]}} for path > /brokers/topics/TempBinLogReqQ @{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005)