[ https://issues.apache.org/jira/browse/HADOOP-18948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Loughran resolved HADOOP-18948. ------------------------------------- Fix Version/s: 3.4.0 Release Note: S3A directory delete and rename will optionally abort all pending uploads under the to-be-deleted paths when fs.s3a.directory.operations.purge.upload is true It is off by default. Resolution: Fixed > S3A. Add option fs.s3a.directory.operations.purge.uploads to purge on > rename/delete > ----------------------------------------------------------------------------------- > > Key: HADOOP-18948 > URL: https://issues.apache.org/jira/browse/HADOOP-18948 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 3.4.0 > Reporter: Steve Loughran > Assignee: Steve Loughran > Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > On third-party stores without lifecycle rules its possible to accrue many GB > of pending multipart uploads, including from > * magic committer jobs where spark driver/MR AM failed before commit/abort > * distcp jobs which timeout and get aborted > * any client code writing datasets which are interrupted before close. > Although there's a purge pending uploads option, that's dangerous because if > any fs is instantiated with it, it can destroy in-flight work > otherwise, the "hadoop s3guard uploads" command does work but needs > scheduling/manual execution > proposed: add a new property {{fs.s3a.directory.operations.purge.uploads}} > which will automatically cancel all pending uploads under a path > * delete: everything under the dir > * rename: all under the source dir > This will be done in parallel to the normal operation, but no attempt to post > abortMultipartUploads in different threads. The assumption here is that this > is rare. And it'll be off by default as in AWS people should have rules for > these things. > + doc (third_party?) > + add new counter/metric for abort operations, count and duration > + test to include cost assertions -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org