Hi Dan,
I'm afraid this is not easily possible using the DataStream API in
STREAMING execution mode today. However, there is one possible solution
and we're introducing changes that will also make this work on STREAMING
mode.
The possible solution is to use the `FileSink` instead of the
`StreamingFileSink`. This is an updated version of the sink that works
in both BATCH and STREAMING mode (see [1]). If you use BATCH execution
mode all your files should be "completed" at the end.
[1]
https://ci.apache.org/projects/flink/flink-docs-master/dev/datastream_execution_mode.html
The thing we're currently working on is FLIP-147 [2], which will allow
sinks (and other operators) to always do one final checkpoint before
shutting down. This will allow them to move the last outstanding
inprogress files over to finished as well.
[2] https://cwiki.apache.org/confluence/x/mw-ZCQ
I hope that helps!
Best,
Aljoscha
On 2021/02/04 21:37, Dan Hill wrote:
Hi Flink user group,
*Background*
I'm changing a Flink SQL job to use Datastream. I'm updating an existing
Minicluster test in my code. It has a similar structure to other tests in
flink-tests. I call StreamExecutionEnvironment.execute. My tests sink
using StreamingFileSink Bulk Formats to tmp local disk.
*Issue*
When I try to check the files on local disk, I see
".part-0-0.inprogress.1234abcd-5678-uuid...".
*Question*
What's the best way to get the test to complete the outputs? I tried
checkpointing very frequently, sleeping, etc but these didn't work.
Thanks!
- Dan