[ https://issues.apache.org/jira/browse/HADOOP-18568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris Nauroth resolved HADOOP-18568. ------------------------------------ Fix Version/s: 3.4.2 Hadoop Flags: Reviewed Assignee: Sayed Mohammad Hossein Torabi Resolution: Fixed > Magic Committer optional clean up > ---------------------------------- > > Key: HADOOP-18568 > URL: https://issues.apache.org/jira/browse/HADOOP-18568 > Project: Hadoop Common > Issue Type: Wish > Components: fs/s3 > Affects Versions: 3.3.3 > Reporter: André F. > Assignee: Sayed Mohammad Hossein Torabi > Priority: Minor > Labels: pull-request-available > Fix For: 3.4.2 > > > It seems that deleting the `__magic` folder, depending on the number of > tasks/partitions used on a given spark job, can take really long time. I'm > having the following behavior on a given Spark job (processing ~30TB, with > ~420k tasks) using the magic committer: > {code:java} > 2022-12-10T21:25:19.629Z pool-3-thread-32 INFO MagicS3GuardCommitter: > Starting: Deleting magic directory s3a://my-bucket/random_hash/__magic > 2022-12-10T21:52:03.250Z pool-3-thread-32 INFO MagicS3GuardCommitter: > Deleting magic directory s3a://my-bucket/random_hash/__magic: duration > 26:43.620s {code} > I don't see a way out of it since the deletion of s3 objects needs to list > all objects under a prefix and this is what may be taking too much time. > Could we somehow make this cleanup optional? (the idea would be to delegate > it through s3 lifecycle policies in order to not create this overhead on the > commit phase). -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org