Re: Question: How to avoid local execution being terminated before session window closes

Klemens Muthmann Thu, 03 Dec 2020 07:53:29 -0800

Hi,

Thanks for the hint. The infinite loop was the solution and my pipelineworks now.


Regards

    Klemens

Am 24.11.20 um 16:59 schrieb Timo Walther:

For debugging you can also implement a simple non-parallel sourceusing`org.apache.flink.streaming.api.functions.source.SourceFunction`. Youwould need to implement the run() method with an endless loop afteremitting all your records.
Regards,
Timo

On 24.11.20 16:07, Klemens Muthmann wrote:
Hi,
Thanks for your reply. I am using processing time instead of eventtime, since we do get the events in batches and some might arrivedays later.
But for my current dev setup I just use a CSV dump of finite size asinput. I will hand over the pipeline to some other guys, who willneed to integrate it with an Apache Kafka Service. Output is writtento a Postgres-Database-System.
I'll have a look at your proposal and let you know if it worked,after having finished a few prerequisite parts.
Regards

     Klemens

Am 24.11.20 um 12:59 schrieb Timo Walther:
Hi Klemens,
what you are observing are reasons why event-time should bepreferred over processing-time. Event-time uses the timestamp ofyour data while processing-time is to basic for many use cases. Esp.when you want to reprocess historic data, you want to do that atfull speed instead of waiting 1 hour for 1-hour-windows.
If you want to use processing-time nevertheless, you need to use asource that produced unbounded streams instead of bounded streamssuch that the pipeline execution theoretically is infinite. Somedocumentation can be found here [1] where you need to use the`FileProcessingMode.PROCESS_CONTINUOUSLY`. But what kind ofconnector are you currently using?
Regards,
Timo
[1]https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/datastream_api.html#data-sources
On 24.11.20 09:59, Klemens Muthmann wrote:
Hi,
I have written an Apache Flink Pipeline containing the followingpiece of code (Java):
stream.window(ProcessingTimeSessionWindows.withGap(Time.seconds(50))).aggregate(newCustomAggregator()).print();
If I run the pipeline using local execution I see the followingbehavior. The "CustomAggregator" calls the `createAccumulator` and`add` methods correctly with the correct data. However it nevercalls `getResult` and my pipeline simply finishes.
So I did a little research and found out that it works if I changethe code to:
stream.window(ProcessingTimeSessionWindows.withGap(Time.seconds(1))).aggregate(newCustomAggregator()).print();
Notice the reduced gap time for the processing time session window.So it seems that execution only continues if the window has beenclosed and if that takes too long, the execution simply aborts. Iguess another factor playing a part in the problem is, that myinitial data is read in much faster than 50 seconds. This resultsin the pipeline being in a state where it only waits for the windowto be closed and having nothing else to do it decides that there isno work left and simply shuts down.
My question now is if it is possible to tell the local executionenvironment to wait for that window to be closed, instead of justshutting down.
Thanks and Regards

     Klemens Muthmann

--
Mit freundlichen Grüßen
          Dr.-Ing. Klemens Muthmann

-----------------------------------
Cyface GmbH
Hertha-Lindner-Straße 10
01067 Dresden
web: www.cyface.de
email: klemens.muthm...@cyface.de

Re: Question: How to avoid local execution being terminated before session window closes

Reply via email to