If I remember correctly, there's a fix for this in Flink 1.14 (but the feature is disabled by default in 1.14, and enabled by default in 1.15). (I'm thinking that execution.checkpointing.checkpoints-after-tasks-finish.enabled [1] takes care of this.)
With Flink 1.13 I believe you'll have to handle this yourself somehow. Regards, David [1] https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/#execution-checkpointing-checkpoints-after-tasks-finish-enabled On Wed, Aug 31, 2022 at 6:26 AM David Clutter <david.clut...@bridg.com> wrote: > I am using Flink 1.13.1 on AWS EMR 6.4. I have an existing application > using DataStream API that I would like to modify to write output to S3. I > am testing the StreamingFileSink with a bounded input. I have enabled > checkpointing. > > A couple questions: > 1) When the program finishes, all the files remain .inprogress. Is that > "Important Note 2" in the documentation > <https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/connectors/datastream/streamfile_sink/>? > Is there a solution to this other than renaming the files myself? Renaming > the files in S3 could be costly I think. > > 2) If I use a deprecated method such as DataStream.writeAsText() is that > guaranteed to write *all* the records from the stream, as long as the job > does not fail? I understand checkpointing will not be effective here. > > Thanks, > David >