Re: Process available data and stop with savepoint

2020-05-18 Thread Arvid Heise
propagate to the output to understand that processing > has finished. > > > Again, thanks everyone for your help! > - Sergii > > On Mon, May 18, 2020 at 8:45 AM Thomas Huang wrote: > >> Hi, >> >> Actually, seems like spark dynamic allocation saves more resource

Re: Process available data and stop with savepoint

2020-05-18 Thread Sergii Mikhtoniuk
case. > > -- > *From:* Arvid Heise > *Sent:* Monday, May 18, 2020 11:15:09 PM > *To:* Congxian Qiu > *Cc:* Sergii Mikhtoniuk ; user < > user@flink.apache.org> > *Subject:* Re: Process available data and stop with savepoint > > Hi Se

Re: Process available data and stop with savepoint

2020-05-18 Thread Thomas Huang
Hi, Actually, seems like spark dynamic allocation saves more resources in that case. From: Arvid Heise Sent: Monday, May 18, 2020 11:15:09 PM To: Congxian Qiu Cc: Sergii Mikhtoniuk ; user Subject: Re: Process available data and stop with savepoint Hi Sergii

Re: Process available data and stop with savepoint

2020-05-18 Thread Arvid Heise
Hi Sergii, your requirements feel a bit odd. It's neither batch nor streaming. Could you tell us why it's not possible to let the job run as a streaming job that runs continuously? Is it just a matter of saving costs? If so, you could monitor the number of records being processed and trigger stop

Re: Process available data and stop with savepoint

2020-05-17 Thread Congxian Qiu
Hi Sergii If I understand correctly, you want to process all the files in some directory, and do not want to process them multiple times. I'm not sure if using `FileProcessingMode#PROCESS_CONTINUOUSLY` instead of `FileProcessingMode#PROCESS_ONCE`[1] can satisfy your needs, and keep the job running

Process available data and stop with savepoint

2020-05-17 Thread Sergii Mikhtoniuk
Hello, I'm migrating my Spark-based stream processing application to Flink (Calcite SQL and temporal tables look too attractive to resist). My Spark app works as follows: - application is started periodically - it reads a directory of Parquet files as a stream - SQL transformations are applied -