Daniel Carl Jones created HADOOP-18420:
------------------------------------------

             Summary: Optimise S3A’s recursive delete to drop successful S3 
keys on retry of S3 DeleteObjects
                 Key: HADOOP-18420
                 URL: https://issues.apache.org/jira/browse/HADOOP-18420
             Project: Hadoop Common
          Issue Type: Sub-task
          Components: fs/s3
            Reporter: Daniel Carl Jones


S3A users with large filesystems performing renames or deletes can run into 
throttling when S3A performs a bulk delete on keys. These are currently batches 
of 250 
([https://github.com/apache/hadoop/blob/c1d82cd95e375410cb0dffc2931063d48687386f/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java#L319-L323]).

When the bulk delete ([S3 
DeleteObjects|https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html])
 fails, it provides a list of keys that failed and why. Today, S3A recovers 
from throttles by sending the DeleteObjects request again with no change. This 
can result in additional deletes and counts towards throttling limits.

Instead, S3A should retry only the keys that failed, limiting the number of 
mutations against the S3 bucket, and hopefully mitigate errors when deleting a 
large number of objects.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to