André Kelpe created HADOOP-18285:
------------------------------------

             Summary: S3a should retry when being throttled by STS (assumed 
roles)
                 Key: HADOOP-18285
                 URL: https://issues.apache.org/jira/browse/HADOOP-18285
             Project: Hadoop Common
          Issue Type: Improvement
    Affects Versions: 3.3.3
            Reporter: André Kelpe


We ran into an issue where we were being throttled by AWS when reading from a 
bucket using the sts assume-role mechanism.

 

The stacktrace looks like this:

 
{code:java}
Caused by: 
com.amazonaws.services.securitytoken.model.AWSSecurityTokenServiceException: 
Rate exceeded (Service: AWSSecurityTokenService; Status Code: 400; Error Code: 
Throttling; Request ID: 02f32511-418c-4b2a-96ef-2d7ba8dafab1; Proxy: null)    
1654700598727
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1862)
    1654700598727
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1415)
    1654700598727
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1384)
    1654700598727
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1154)
    1654700598727
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:811)
    1654700598727
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779)
    1654700598727
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:753)
    1654700598727
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:713)
    1654700598727
        at 
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:695)
    1654700598727
        at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:559)    
1654700598727
        at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:539)    
1654700598727
        at 
com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.doInvoke(AWSSecurityTokenServiceClient.java:1682)
    1654700598727
        at 
com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.invoke(AWSSecurityTokenServiceClient.java:1649)
    1654700598727
        at 
com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.invoke(AWSSecurityTokenServiceClient.java:1638)
    1654700598727
        at 
com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.executeAssumeRole(AWSSecurityTokenServiceClient.java:498)
    1654700598727
        at 
com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.assumeRole(AWSSecurityTokenServiceClient.java:467)
    1654700598727
        at 
com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider.newSession(STSAssumeRoleSessionCredentialsProvider.java:348)
    1654700598727
        at 
com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider.access$000(STSAssumeRoleSessionCredentialsProvider.java:44)
    1654700598727
        at 
com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider$1.call(STSAssumeRoleSessionCredentialsProvider.java:93)
    1654700598727
        at 
com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider$1.call(STSAssumeRoleSessionCredentialsProvider.java:90)
    1654700598727
        at 
com.amazonaws.auth.RefreshableTask.refreshValue(RefreshableTask.java:295)    
1654700598727
        at 
com.amazonaws.auth.RefreshableTask.blockingRefresh(RefreshableTask.java:251)    
1654700598727
        at 
com.amazonaws.auth.RefreshableTask.getValue(RefreshableTask.java:192)    
1654700598727
        at 
com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider.getCredentials(STSAssumeRoleSessionCredentialsProvider.java:320)
    1654700598727{code}

I read the code and from what I can see the Exception is being handled by 
S3AUtils here 
[https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AUtils.java#L240]

It does not further inspect the message and assumes that the 400 is indeed a 
bad request. Because of this it gets handled as a 
{color:#24292f}AWSBadRequestException{color} which then will lead to the 
request to fail instead of retry in the S3ARetryPolicy.

[https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ARetryPolicy.java#L215-L217]

 

A better approach seems to be to look at the sub-type and message of the 
original exception and handle it as a back-off and retry by throwing a 
different exception than {color:#24292f}AWSBadRequestException{color}

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to