Re: Streaming API has a long delay at the beginning of the process.

2017-09-18 Thread Yuta Morisawa
Hi Fabian, Thanks a lot. I got a better understanding. > Operators are never GC'd (unless a job was cancelled) That's great information. Maybe, this is related to so called Managed Memory. The document will be better if detail documents about Memory Management exists. Thank you, Yuta On 2017

Re: Streaming API has a long delay at the beginning of the process.

2017-09-18 Thread Fabian Hueske
Hi Yuta, you got most things right :-) 3) sources (such as Kafka connectors) are also considered operators and start immediately because they are sources. 4) All other operators start when they need to process their first record. Operators are never GC'd (unless a job was cancelled), so the setup

Re: Streaming API has a long delay at the beginning of the process.

2017-09-15 Thread Yuta Morisawa
Hi Fabian, Thank you for your description. This is my understanding. 1, At the exact time execute() method called, Flink creates JobGraph, submit it to JobManager, deploy tasks to TaskManagers and DOES NOT execute each operators. 2, Operators are executed when they needed. 3, Sources(kafka-co

Re: Streaming API has a long delay at the beginning of the process.

2017-09-15 Thread Fabian Hueske
Hi Yuta, when the execute() method is called, the a so-called JobGraph is constructed from all operators that have been added before by calling map(), keyBy() and so on. The JobGraph is then submitted to the JobManager which is the master process in Flink. Based on the JobGraph, the master deploys

Re: Streaming API has a long delay at the beginning of the process.

2017-09-14 Thread Yuta Morisawa
Hi, Fabian > If I understand you correctly, the problem is only for the first events > that are processed. Yes. More Precisely, first 300 kafka-messages. > AFAIK, Flink lazily instantiates its operators which means that a source > task starts to consume records from Kafka before the subsequent t

Re: Streaming API has a long delay at the beginning of the process.

2017-09-14 Thread Fabian Hueske
Hi, If I understand you correctly, the problem is only for the first events that are processed. AFAIK, Flink lazily instantiates its operators which means that a source task starts to consume records from Kafka before the subsequent tasks have been started. That's why the latency of the first rec

Streaming API has a long delay at the beginning of the process.

2017-09-12 Thread Yuta Morisawa
Hi, I am worrying about the delay of the Streaming API. My application is that it gets data from kafka-connectors and process them, then push data to kafka-producers. The problem is that the app suffers a long delay when the first data come in the cluster. It takes about 1000ms to process data