[ https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234297#comment-14234297 ]
Bhavesh Mistry commented on KAFKA-1642: --------------------------------------- [~ewencp], 1) I will posted toward KAFKA-1788 and perhaps link the issue. 2) True , some sort of measure would be great 5,10...25 50, 95 and 99 percentile would be great of execution time. The point is just measure the duration report the rate of execution. 3) Agree with what you are saying and I have observed same behavior. But only recommendation is to add some intelligence to *timeouts* to detect if for long period and consecutive timeout is zero then there is problem. (Little more defensive) 4) Again I agree with you point, but based in your previous comments you had mentioned that you may consider having back-off logic further up the chain. So I was just checking run() is best place to do that check. Again, may be add intelligence here if you get consecutive “Exception” then likelihood of high CPU is high. 5) Ok. I agree what you are saying is data needs to be de-queue so more data can be en-queue even in event of network lost. Is my understanding correct ? 6) All I am saying is network firewall rule (such as only 2 TCP connections per source host) or Brokers running out of File Descriptor so new connection to broker is not established but Client have live and active TCP connection to same broker. But based on what I see in the method * initiateConnect* will mark the entire Broker or Node status as disconnected. Is this expected behavior? So question is: will client continue to send data ? Thank you very much for entertaining my questions so far and I will test out the patch next week. Thanks, Bhavesh > [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network > connection is lost > --------------------------------------------------------------------------------------- > > Key: KAFKA-1642 > URL: https://issues.apache.org/jira/browse/KAFKA-1642 > Project: Kafka > Issue Type: Bug > Components: producer > Affects Versions: 0.8.2 > Reporter: Bhavesh Mistry > Assignee: Ewen Cheslack-Postava > Priority: Blocker > Fix For: 0.8.2 > > Attachments: > 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, > KAFKA-1642.patch, KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, > KAFKA-1642_2014-10-23_16:19:41.patch > > > I see my CPU spike to 100% when network connection is lost for while. It > seems network IO thread are very busy logging following error message. Is > this expected behavior ? > 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR > org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka > producer I/O thread: > java.lang.IllegalStateException: No entry found for node -2 > at > org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110) > at > org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99) > at > org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394) > at > org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174) > at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175) > at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115) > at java.lang.Thread.run(Thread.java:744) > Thanks, > Bhavesh -- This message was sent by Atlassian JIRA (v6.3.4#6332)