Steve Loughran created HADOOP-19347: ---------------------------------------
Summary: AWS SDK deleteObjects() and S3Store.deleteObjects() don't handle 500 failures of individual objects Key: HADOOP-19347 URL: https://issues.apache.org/jira/browse/HADOOP-19347 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 3.4.1 Reporter: Steve Loughran S3Store.deleteObjects() encountered 500 error and didn't recover. We normally assume that 500 errors are already retried by the SDK so our own retry logic doesn't bother The root cause is that the 500 errors can surface within the bulk delete. * The delete POST returns 200, so SDK is happy * but one of the rows in the request is reports the S3Error "InternalError": {{Code=InternalError, Message=We encountered an internal error. Please try again.)]}} Proposed. * bulk delete invoker must map "InternalError" to AWSStatus500Exception and throw that. * Add a retry policy for bulk deletes which considers AWSStatus500Exception as retriable. retry. We currently don't on the assumption that the SDK will retry, which it does for base retries, but clearly not for multiobject delete. * Maybe also consider possibility that a partial 503 response could be generated? that is: only part of the delete throttled? {code} Caused by: org.apache.hadoop.fs.s3a.impl.MultiObjectDeleteException: [S3Error(Key=table/warehouse/tablespace/external/hive/table/-tmp.-ext-10000/file/, Code=InternalError, Message=We encountered an internal error. Please try again.)] at org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObjects(S3AFileSystem.java:3186) at org.apache.hadoop.fs.s3a.S3AFileSystem.removeKeysS3(S3AFileSystem.java:3422) at org.apache.hadoop.fs.s3a.S3AFileSystem.removeKeys(S3AFileSystem.java:3481) at org.apache.hadoop.fs.s3a.S3AFileSystem$OperationCallbacksImpl.removeKeys(S3AFileSystem.java:2558) at org.apache.hadoop.fs.s3a.impl.RenameOperation.lambda$removeSourceObjects$3(RenameOperation.java:625) at org.apache.hadoop.fs.s3a.Invoker.lambda$once$0(Invoker.java:165) at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:122) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org