Thanks Aljoscha! On Fri, Feb 5, 2021 at 1:48 AM Aljoscha Krettek <aljos...@apache.org> wrote:
> Hi Dan, > > I'm afraid this is not easily possible using the DataStream API in > STREAMING execution mode today. However, there is one possible solution > and we're introducing changes that will also make this work on STREAMING > mode. > > The possible solution is to use the `FileSink` instead of the > `StreamingFileSink`. This is an updated version of the sink that works > in both BATCH and STREAMING mode (see [1]). If you use BATCH execution > mode all your files should be "completed" at the end. > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/dev/datastream_execution_mode.html > > The thing we're currently working on is FLIP-147 [2], which will allow > sinks (and other operators) to always do one final checkpoint before > shutting down. This will allow them to move the last outstanding > inprogress files over to finished as well. > > [2] https://cwiki.apache.org/confluence/x/mw-ZCQ > > I hope that helps! > > Best, > Aljoscha > > On 2021/02/04 21:37, Dan Hill wrote: > >Hi Flink user group, > > > >*Background* > >I'm changing a Flink SQL job to use Datastream. I'm updating an existing > >Minicluster test in my code. It has a similar structure to other tests in > >flink-tests. I call StreamExecutionEnvironment.execute. My tests sink > >using StreamingFileSink Bulk Formats to tmp local disk. > > > >*Issue* > >When I try to check the files on local disk, I see > >".part-0-0.inprogress.1234abcd-5678-uuid...". > > > >*Question* > >What's the best way to get the test to complete the outputs? I tried > >checkpointing very frequently, sleeping, etc but these didn't work. > > > >Thanks! > >- Dan >