Re: S3 recovery and checkpoint directories exhibit explosive growth

Stefan Richter Wed, 26 Jul 2017 00:49:35 -0700

Hi,

I think Stephan was talking about removing this part:


        try {
                FileUtils.deletePathIfEmpty(fs, filePath.getParent());
        } catch (Exception ignored) {}


This part should *NOT* be removed:

        fs.delete(filePath, false);

The reason is that the first is only an additional cleanup that, for each 
deleted file, checks if the containing directory is now empty and then removes 
the directory. Checking for empty directories includes listing all files in the 
directory, which is expensive in S3 and the only downside of removing the line 
are orphaned empty checkpoint directories, which could be cleaned by a script.

Delete of the file itself must remain because that is how we release files from 
old checkpoints.

Best,
Stefan

> Am 26.07.2017 um 04:31 schrieb prashantnayak <prash...@intellifylearning.com>:
> 
> Hi Stephan
> 
> Unclear on what you mean by the "trash" option... thought that was only
> available for command line hadoop and not applicable for API, which is what
> Flink uses?  If there is a configuration for the Flink/Hadoop connector,
> please let me know.
> 
> Also, one additional thing about S3.... S3 supports this option of
> "versioned" buckets... Since versioning basically results in a delete on a
> object not actually deleting it (unless there is a bucket lifecycle
> policy)... I think you should recommend that Flink users that rely on S3
> turn off bucket versioning since it seems to not really be a factor for
> Flink...
> 
> Thanks
> Prashant
> 
> 
> 
> --
> View this message in context: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/S3-recovery-and-checkpoint-directories-exhibit-explosive-growth-tp14270p14453.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at 
> Nabble.com.

Re: S3 recovery and checkpoint directories exhibit explosive growth

Reply via email to