Steve Loughran created HADOOP-14303:
---------------------------------------

             Summary: review retry logic on all S3 calls, implement where needed
                 Key: HADOOP-14303
                 URL: https://issues.apache.org/jira/browse/HADOOP-14303
             Project: Hadoop Common
          Issue Type: Sub-task
          Components: fs/s3
    Affects Versions: 2.8.0
            Reporter: Steve Loughran


AWS S3, IAM, KMS, DDB etc all throttle callers: the S3A code needs to handle 
this without failing, as if it slows down its requests it can recover.

1. Look at all the places where we are calling S3A via the AWS SDK and make 
sure we are retrying with some backoff & jitter policy, ideally something 
unified. This must be more systematic than the case-by-case, problem-by-problem 
strategy we are implicitly using.
2. Many of the AWS S3 SDK calls do implement retry (e.g PUT/multipart PUT), but 
we need to check the other parts of the process: login, initiate/complete MPU, 
...

Related

HADOOP-13811 Failed to sanitize XML document destined for handler class
HADOOP-13664 S3AInputStream to use a retry policy on read failures

This stuff is all hard to test. A key need is to be able to differentiate 
recoverable throttle & network failures from unrecoverable problems like: auth, 
network config (e.g bad endpoint), etc.

May be the opportunity to add a faulting subclass of Amazon S3 client which can 
be configured in IT Tests to fail at specific points. Ryan Blue's mcok S3 
client does this in HADOOP-13786, but it is for 100% mock. I'm thinking of 
something with similar fault raising, but in front of the real S3A client 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to