Thanks Aljoscha!

On Fri, Feb 5, 2021 at 1:48 AM Aljoscha Krettek <aljos...@apache.org> wrote:

> Hi Dan,
>
> I'm afraid this is not easily possible using the DataStream API in
> STREAMING execution mode today. However, there is one possible solution
> and we're introducing changes that will also make this work on STREAMING
> mode.
>
> The possible solution is to use the `FileSink` instead of the
> `StreamingFileSink`. This is an updated version of the sink that works
> in both BATCH and STREAMING mode (see [1]). If you use BATCH execution
> mode all your files should be "completed" at the end.
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/dev/datastream_execution_mode.html
>
> The thing we're currently working on is FLIP-147 [2], which will allow
> sinks (and other operators) to always do one final checkpoint before
> shutting down. This will allow them to move the last outstanding
> inprogress files over to finished as well.
>
> [2] https://cwiki.apache.org/confluence/x/mw-ZCQ
>
> I hope that helps!
>
> Best,
> Aljoscha
>
> On 2021/02/04 21:37, Dan Hill wrote:
> >Hi Flink user group,
> >
> >*Background*
> >I'm changing a Flink SQL job to use Datastream.  I'm updating an existing
> >Minicluster test in my code.  It has a similar structure to other tests in
> >flink-tests.  I call StreamExecutionEnvironment.execute.  My tests sink
> >using StreamingFileSink Bulk Formats to tmp local disk.
> >
> >*Issue*
> >When I try to check the files on local disk, I see
> >".part-0-0.inprogress.1234abcd-5678-uuid...".
> >
> >*Question*
> >What's the best way to get the test to complete the outputs?  I tried
> >checkpointing very frequently, sleeping, etc but these didn't work.
> >
> >Thanks!
> >- Dan
>

Reply via email to