[ https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224041#comment-14224041 ]
Bhavesh Mistry edited comment on KAFKA-1642 at 11/25/14 4:39 AM: ----------------------------------------------------------------- [~ewencp], I hope above steps will give you comprehensive steps to reproduce problems with run() method. It would be really great if we can make the client more resilient and robust so network and brokers instability does not cause CPU spikes and degrade application performance. Hence, I would strongly at least detect the time run(time) is taking and do some stats based on some configuration, we can do CPU Throttling (if need) just to be more defensive or at lest detect that io thread is taking CPU cycle. By the way the experimental patch still works for steps describe above as well due to hard coded back-off. Any time you have patch or any thing, please let me know I will test it out ( you have my email id) . Once again thanks for your detail analysis and looking at this at short notice. Please look into to ClusterConnectionStates and how it manage the state of node when disconnecting immediately . please look into connecting(int node, long now) and this (I feel connecting needs to come before not after). selector.connect(node.id(), new InetSocketAddress(node.host(), node.port()), this.socketSendBuffer, this.socketReceiveBuffer); this.connectionStates.connecting(node.id(), now); Thanks, Bhavesh was (Author: bmis13): [~ewencp], I hope above steps will give you comprehensive steps to reproduce problems with run() method. It would be really great if we can make the client more resilient and robust so network and brokers instability does not cause CPU spikes and degrade application performance. Hence, I would strongly at least detect the time run(time) is taking and do based on some configuration, we can do CPU Throttling just to be more defensive or at lest detect that io thread is taking CPU cycle. By the way the experimental patch still works for steps describe above as well due to hard coded back-off. Any time you have patch or any thing, please let me know I will test it out. Once thanks for your detail analysis. Please look into to ClusterConnectionStates and how it manage the state of node when disconnecting immediately . please look into connecting(int node, long now) and this (I feel connecting needs to come before not after). selector.connect(node.id(), new InetSocketAddress(node.host(), node.port()), this.socketSendBuffer, this.socketReceiveBuffer); this.connectionStates.connecting(node.id(), now); Thanks, Bhavesh > [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network > connection is lost > --------------------------------------------------------------------------------------- > > Key: KAFKA-1642 > URL: https://issues.apache.org/jira/browse/KAFKA-1642 > Project: Kafka > Issue Type: Bug > Components: producer > Affects Versions: 0.8.1.1, 0.8.2 > Reporter: Bhavesh Mistry > Assignee: Ewen Cheslack-Postava > Priority: Blocker > Fix For: 0.8.2 > > Attachments: > 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, > KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, > KAFKA-1642_2014-10-23_16:19:41.patch > > > I see my CPU spike to 100% when network connection is lost for while. It > seems network IO thread are very busy logging following error message. Is > this expected behavior ? > 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR > org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka > producer I/O thread: > java.lang.IllegalStateException: No entry found for node -2 > at > org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110) > at > org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99) > at > org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394) > at > org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174) > at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175) > at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115) > at java.lang.Thread.run(Thread.java:744) > Thanks, > Bhavesh -- This message was sent by Atlassian JIRA (v6.3.4#6332)