Hi, your concerns about deleting files when using incremental checkpoints is very valid. Deleting empty checkpoint folders is obviously ok. As for files, I have recently added some additional logging to the checkpointing mechanism to report the files referenced in the last checkpoint. I will try to also include the logging in 1.3.2. Based on this, you could make safe assumptions about which files are actually orphaned. I am even considering packing this list as a plain text file with the checkpoint, to make this more transparent for users.
Best, Stefan > Am 26.07.2017 um 16:57 schrieb prashantnayak <prash...@intellifylearning.com>: > > Thanks Stephan and Stefan > > We're looking forward to this patch in 1.3.2 > > We will use a patched version depending upon when 1.3.2 is going to be > available. > > We're also implementing a cron job to remove orphaned/older > completedCheckpoint files per your recommendations.. one caveat with a job > like that is that we have to check if a particular job is > stopped/paused/down and also if the Job Manager is down so we don't > accidentally remove valid checkpoint files.. this makes it a bit dicey.... > ideal of course is not to have to do this. > > The move away from hadoop/s3 would be welcome as well. > > Flink job state is critical to us since we have very long running jobs > (months) processing hundreds of millions of records. > > Thanks > Prashant > > > > -- > View this message in context: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/S3-recovery-and-checkpoint-directories-exhibit-explosive-growth-tp14270p14477.html > Sent from the Apache Flink User Mailing List archive. mailing list archive at > Nabble.com.