[ https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224039#comment-14224039 ]
Bhavesh Mistry commented on KAFKA-1642: --------------------------------------- Here are some more cases to reproduce this simulating network connection issue with one of brokers only and still problem persist: Case 1: brokers connection is down (note according to ZK leader for partition still with b1 ) Have tree brokers: b1, b2, b3 1) Start your daemon program and keep sending data to all the brokers and continue sending some data 2) Observed that you have data netstat -a | grep b1|b2|b3 (keep pumping data for 5 minutes and observed normal behavior using top -pid or top -p java_pid ) 3) Simulate a network connection or problem establishing new TCP connection via following as java program still continues to pump data aggressively (please note TCP connection to B1 still active and connected) a) sudo vi /etc/hosts 2) add entry "b1 127.0.0.1" b) /etc/init.d/network restart after while (5 to 7 minutes you will see the issue but keep pumping data, and also repeat this for b2 it will be more CPU consumption) 4) Under a heavy dumping data, now producer will try to establish new TCP connection to B1 and it will get connection refused (Note that CPU spikes up again and remain in state) just because could not establish. Case 2) Simulate Firewall rule such as you are only allowed (4 TCP connection to each brokers) Do step 1,2 and 3 above. 4) use Iptable rule to reject To start an "enforcing fire wall": iptables -A OUTPUT -p tcp -m tcp -d b1 --dport 9092 -j REJECT 5) Still pump data will while iptable rejects ( you will see CPU spike to to 200% more depending on # of producer) To "recover" : iptables -D OUTPUT -p tcp -m tcp -d b1 --dport 9092 -j REJECT > [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network > connection is lost > --------------------------------------------------------------------------------------- > > Key: KAFKA-1642 > URL: https://issues.apache.org/jira/browse/KAFKA-1642 > Project: Kafka > Issue Type: Bug > Components: producer > Affects Versions: 0.8.1.1, 0.8.2 > Reporter: Bhavesh Mistry > Assignee: Ewen Cheslack-Postava > Priority: Blocker > Fix For: 0.8.2 > > Attachments: > 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, > KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, > KAFKA-1642_2014-10-23_16:19:41.patch > > > I see my CPU spike to 100% when network connection is lost for while. It > seems network IO thread are very busy logging following error message. Is > this expected behavior ? > 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR > org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka > producer I/O thread: > java.lang.IllegalStateException: No entry found for node -2 > at > org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110) > at > org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99) > at > org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394) > at > org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174) > at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175) > at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115) > at java.lang.Thread.run(Thread.java:744) > Thanks, > Bhavesh -- This message was sent by Atlassian JIRA (v6.3.4#6332)