Hello, I would like clarification on the StreamingFileSink, thank you. >From my testing, it seems that resuming job from checkpoint does *not* also restore the rolling part counter.
E.g, job may have stopped with last file: *part-6-71* But when resuming from most recent checkpoint: *part-6-89* (There is unexplained gap). This is a problem if I am having an issue with my job, and need to roll back *more than one checkpoint*. After rolling back to the 4th last checkpoint, e.g, the data will be written into *different part file names*, causing duplication. ----------------------------------------------------------------- For example, checkpoints: *chk-17, chk-18, chk-19, chk-20* Original data: *part-1-5, part-1-6, part-1-7* Rollback to *chk-17*, which writes *part-1-18*, but with the same data as *part-1-5*! This is duplicate. ------------------------------------------------------------------ Am I correct? How to avoid this?