Steve Loughran created HADOOP-19347:
---------------------------------------

             Summary: AWS SDK deleteObjects() and S3Store.deleteObjects() don't 
handle 500 failures of individual objects
                 Key: HADOOP-19347
                 URL: https://issues.apache.org/jira/browse/HADOOP-19347
             Project: Hadoop Common
          Issue Type: Bug
          Components: fs/s3
    Affects Versions: 3.4.1
            Reporter: Steve Loughran


S3Store.deleteObjects() encountered 500 error and didn't recover.

We normally assume that 500 errors are already retried by the SDK so our own 
retry logic doesn't bother

The root cause is that the 500 errors can surface within the bulk delete.
* The delete POST returns 200, so SDK is happy
* but one of the rows in the request is reports the S3Error "InternalError":
{{Code=InternalError, Message=We encountered an internal error. Please try 
again.)]}}


Proposed.
* bulk delete invoker must map "InternalError" to AWSStatus500Exception and 
throw that.
* Add a retry policy for bulk deletes which considers AWSStatus500Exception as 
retriable. retry. We currently don't on the assumption that the SDK will retry, 
which it does for base retries, but clearly not for multiobject delete.
* Maybe also consider possibility that a partial 503 response could be 
generated? that is: only part of the delete throttled?

{code}

Caused by: org.apache.hadoop.fs.s3a.impl.MultiObjectDeleteException: 
[S3Error(Key=table/warehouse/tablespace/external/hive/table/-tmp.-ext-10000/file/,
 Code=InternalError, Message=We encountered an internal error. Please try 
again.)]
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObjects(S3AFileSystem.java:3186)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.removeKeysS3(S3AFileSystem.java:3422)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.removeKeys(S3AFileSystem.java:3481)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem$OperationCallbacksImpl.removeKeys(S3AFileSystem.java:2558)
        at 
org.apache.hadoop.fs.s3a.impl.RenameOperation.lambda$removeSourceObjects$3(RenameOperation.java:625)
        at org.apache.hadoop.fs.s3a.Invoker.lambda$once$0(Invoker.java:165)
        at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:122)
  
{code}





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to