Hi all,

Just run into a bit of a problem and I am not sure what the behavior should
be and if this should be considered a bug? Or if there is some other way
this should be handled?

I have a streaming job with a stream that eventually closes, this job sinks
to a StreamingFileSink.
The problem I am experiencing is that any data written to the sink between
the last checkpoint and the close of the stream is list.

This happens (AFAICT) because the StreamingFileSink relies on checkpoints
to commit files and closing the stream currently does not try and commit
anything.

It seems like just making close call
`buckets.commitUpToCheckpoint(Long.MAX_VALUE)` would work pretty well
assuming it is a an actual stream close, but could be problematic in the
events of a savepoint/cancel and resuming later (it may only mean some
files would be prematurely committed). Ideally, we would be able to
differentiate between the two different types of close (an actual stream
finishing vs a cancel), but at the moment that doesn't seem supported.

If this considered a bug, please let me know and I will file a Jira, if
not, what is the "correct" way to handle getting all the data out with any
sinks that rely on a checkpoint to commit data?

Thanks

Reply via email to