Ah looks like I need to use 1.12 for this. I'm still on 1.11. On Fri, Feb 5, 2021, 08:37 Dan Hill <quietgol...@gmail.com> wrote:
> Thanks Aljoscha! > > On Fri, Feb 5, 2021 at 1:48 AM Aljoscha Krettek <aljos...@apache.org> > wrote: > >> Hi Dan, >> >> I'm afraid this is not easily possible using the DataStream API in >> STREAMING execution mode today. However, there is one possible solution >> and we're introducing changes that will also make this work on STREAMING >> mode. >> >> The possible solution is to use the `FileSink` instead of the >> `StreamingFileSink`. This is an updated version of the sink that works >> in both BATCH and STREAMING mode (see [1]). If you use BATCH execution >> mode all your files should be "completed" at the end. >> >> [1] >> https://ci.apache.org/projects/flink/flink-docs-master/dev/datastream_execution_mode.html >> >> The thing we're currently working on is FLIP-147 [2], which will allow >> sinks (and other operators) to always do one final checkpoint before >> shutting down. This will allow them to move the last outstanding >> inprogress files over to finished as well. >> >> [2] https://cwiki.apache.org/confluence/x/mw-ZCQ >> >> I hope that helps! >> >> Best, >> Aljoscha >> >> On 2021/02/04 21:37, Dan Hill wrote: >> >Hi Flink user group, >> > >> >*Background* >> >I'm changing a Flink SQL job to use Datastream. I'm updating an existing >> >Minicluster test in my code. It has a similar structure to other tests >> in >> >flink-tests. I call StreamExecutionEnvironment.execute. My tests sink >> >using StreamingFileSink Bulk Formats to tmp local disk. >> > >> >*Issue* >> >When I try to check the files on local disk, I see >> >".part-0-0.inprogress.1234abcd-5678-uuid...". >> > >> >*Question* >> >What's the best way to get the test to complete the outputs? I tried >> >checkpointing very frequently, sleeping, etc but these didn't work. >> > >> >Thanks! >> >- Dan >> >