[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224039#comment-14224039
 ] 

Bhavesh Mistry commented on KAFKA-1642:
---------------------------------------

Here are some more cases to reproduce this simulating network connection issue 
with one of brokers only and still problem persist:

Case 1:  brokers connection is down (note according to ZK leader for partition 
still with b1 ) 
Have tree brokers: b1, b2, b3
1)  Start your daemon program and keep sending data to all the brokers and 
continue sending some data 
2)  Observed that you have data  netstat -a | grep b1|b2|b3   (keep pumping 
data for 5 minutes and observed normal behavior using top -pid or top -p 
java_pid )
3) Simulate a network connection or problem establishing new TCP connection via 
following as java program still continues to pump data aggressively (please 
note TCP connection to B1 still active and connected)
a)  sudo vi /etc/hosts 2) add entry "b1 127.0.0.1" 
b) /etc/init.d/network restart  after while (5 to 7 minutes you will see the 
issue but keep pumping data, and also repeat this for b2 it will be more CPU 
consumption) 
 
4) Under a heavy dumping data, now producer will try to establish new TCP 
connection to B1 and it will get connection refused (Note that CPU spikes up 
again and remain in state) just because could not establish.

Case 2) Simulate Firewall rule such as you are only allowed (4 TCP connection 
to each brokers) 

Do step 1,2 and 3 above.
4) use Iptable rule to reject 
To start an "enforcing fire wall":
iptables -A OUTPUT -p tcp -m tcp -d b1 --dport 9092 -j REJECT
5) Still pump data will while iptable rejects ( you will see CPU spike to to 
200% more depending on # of producer)
To "recover" :
iptables -D OUTPUT -p tcp -m tcp -d b1 --dport 9092 -j REJECT


> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-1642
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1642
>             Project: Kafka
>          Issue Type: Bug
>          Components: producer 
>    Affects Versions: 0.8.1.1, 0.8.2
>            Reporter: Bhavesh Mistry
>            Assignee: Ewen Cheslack-Postava
>            Priority: Blocker
>             Fix For: 0.8.2
>
>         Attachments: 
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
> KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to