Re: StreamingFileSink duplicate data

2019-11-21 Thread Paul Lam
Hi, StreamingFileSink would not remove committed files, so if you use a non-latest checkpoint to restore state, you may need to perform a manual cleanup. WRT the part id issue, StreamingFileSink will track the global max part number, and use this value + 1 as the new id upon restoring. In this

StreamingFileSink duplicate data

2019-11-20 Thread Lei Nie
Hello, I would like clarification on the StreamingFileSink, thank you. >From my testing, it seems that resuming job from checkpoint does *not* also restore the rolling part counter. E.g, job may have stopped with last file: *part-6-71* But when resuming from most recent checkpoint: *part-6-89* (