Shannon Carey created FLINK-4069:
------------------------------------

             Summary: Kafka Consumer should not initialize on construction
                 Key: FLINK-4069
                 URL: https://issues.apache.org/jira/browse/FLINK-4069
             Project: Flink
          Issue Type: Improvement
          Components: Kafka Connector
    Affects Versions: 1.0.3
            Reporter: Shannon Carey


The Kafka Consumer connector currently interacts over the network with Kafka in 
order to get partition metadata when the class is constructed. Instead, it 
should do that work when the job actually begins to run (for example, in 
AbstractRichFunction#open() of FlinkKafkaConsumer0?).

The main weakness of broker querying in the constructor is that if there are 
network problems, Flink might take a long time (eg. ~1hr) inside the 
user-supplied main() method while it attempts to contact each broker and 
perform retries. In general, setting up the Kafka partitions does not seem 
strictly necessary as part of execution of main() in order to set up the job 
plan/topology.

However, as Robert Metzger mentions, there are important concerns with how 
Kafka partitions are handled:

"The main reason why we do the querying centrally is:
a) avoid overloading the brokers
b) send the same list of partitions (in the same order) to all parallel 
consumers to do a fixed partition assignments (also across restarts). When we 
do the querying in the open() method, we need to make sure that all partitions 
are assigned, without duplicates (also after restarts in case of failures)."

See also the mailing list discussion: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/API-request-to-submit-job-takes-over-1hr-td7319.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to