Hello, I'm migrating my Spark-based stream processing application to Flink (Calcite SQL and temporal tables look too attractive to resist).
My Spark app works as follows: - application is started periodically - it reads a directory of Parquet files as a stream - SQL transformations are applied - resulting append stream is written to another directory - it runs until all available data is processed - checkpoints its state - and **exits** - upon next run it resumes where it left off, processing only new data I'm having difficulties replicating this start-stop-resume behavior with Flink. When I setup my input stream using: env.readFile[Row](..., FileProcessingMode.PROCESS_CONTINUOUSLY) ... I get an infinite stream, but the application will naturally keep running until aborted manually. When I use FileProcessingMode.PROCESS_ONCE - the application exits after exhausting all inputs, but it seems that Flink also treats the end of the stream as max watermark so, for example, it will close all tumbling windows that I don't want to be closed yet since more data will arrive upon next run. Is there a way not to emit a max watermark with PROCESS_ONCE? If so, can I still trigger a savepoint when env.execute() returns? Alternatively, if I use PROCESS_CONTINUOUSLY along with env.executeAsync() is there a way for me to detect when file stream was exhausted to call job.stopWithSavepoint()? Thanks for your help! - Sergii