Re: S3 recovery and checkpoint directories exhibit explosive growth

Stefan Richter Wed, 26 Jul 2017 09:18:04 -0700

Hi,

your concerns about deleting files when using incremental checkpoints is very 
valid. Deleting empty checkpoint folders is obviously ok. As for files, I have 
recently added some additional logging to the checkpointing mechanism to report 
the files referenced in the last checkpoint. I will try to also include the 
logging in 1.3.2.  Based on this, you could make safe assumptions about which 
files are actually orphaned. I am even considering packing this list as a plain 
text file with the checkpoint, to make this more transparent for users.


Best,
Stefan

> Am 26.07.2017 um 16:57 schrieb prashantnayak <prash...@intellifylearning.com>:
> 
> Thanks Stephan and Stefan
> 
> We're looking forward to this patch in 1.3.2
> 
> We will use a patched version depending upon when 1.3.2 is going to be
> available.
> 
> We're also implementing a cron job to remove orphaned/older
> completedCheckpoint files per your recommendations..  one caveat with a job
> like that is that we have to check if a particular job is
> stopped/paused/down and also if the Job Manager is down so we don't
> accidentally remove valid checkpoint files..   this makes it a bit dicey....
> ideal of course is not to have to do this. 
> 
> The move away from hadoop/s3 would be welcome as well.
> 
> Flink job state is critical to us since we have very long running jobs
> (months) processing hundreds of millions of records.  
> 
> Thanks
> Prashant
> 
> 
> 
> --
> View this message in context: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/S3-recovery-and-checkpoint-directories-exhibit-explosive-growth-tp14270p14477.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at 
> Nabble.com.

Re: S3 recovery and checkpoint directories exhibit explosive growth

Reply via email to