Thanks Stephan and Stefan We're looking forward to this patch in 1.3.2
We will use a patched version depending upon when 1.3.2 is going to be available. We're also implementing a cron job to remove orphaned/older completedCheckpoint files per your recommendations.. one caveat with a job like that is that we have to check if a particular job is stopped/paused/down and also if the Job Manager is down so we don't accidentally remove valid checkpoint files.. this makes it a bit dicey.... ideal of course is not to have to do this. The move away from hadoop/s3 would be welcome as well. Flink job state is critical to us since we have very long running jobs (months) processing hundreds of millions of records. Thanks Prashant -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/S3-recovery-and-checkpoint-directories-exhibit-explosive-growth-tp14270p14477.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.