Aaditya Ramesh created KAFKA-2553:
-------------------------------------

             Summary: Kafka Consumer Hangs after Network Partition
                 Key: KAFKA-2553
                 URL: https://issues.apache.org/jira/browse/KAFKA-2553
             Project: Kafka
          Issue Type: Bug
          Components: consumer
    Affects Versions: 0.8.1.1
         Environment: Amazon EC2, Ubuntu 12.04.
            Reporter: Aaditya Ramesh
            Assignee: Neha Narkhede
         Attachments: kafka_bug_report

We have a Kafka consumer in an EC2 instance in Ireland that fetches data from a 
kafka cluster in a datacenter in the eastern United States. We sporadically 
encounter strange network partitions where we are unable to ping any machines 
between the two data centers (the ping always times out), but this kind of 
network partition is not too strange for inter-data center connections. 
However, Kafka consumer's connection to Zookeeper never recovers after one of 
these network hiccups and requires a full process restart in order to begin 
consuming from the remote data center after the network has recovered. The 
relevant code in ZookeeperConsumerConnector.scala catches all Throwables and 
does nothing with them, which not only doesn't alert the process, but also 
doesn't display any alerting metrics that we could use to diagnose such a hung 
state. We therefore patched the client code in our codebase to perform a 
System.exit(0) whenever this occurs, since a restart is better than failing 
silently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to