[jira] [Created] (FLINK-10664) Flink: Checkpointing fails with S3 exception - Please reduce your request rate

Pawel Bartoszek (JIRA) Wed, 24 Oct 2018 02:28:50 -0700

Pawel Bartoszek created FLINK-10664:
---------------------------------------


             Summary: Flink: Checkpointing fails with S3 exception - Please 
reduce your request rate
                 Key: FLINK-10664
                 URL: https://issues.apache.org/jira/browse/FLINK-10664
             Project: Flink
          Issue Type: Improvement
          Components: JobManager, TaskManager
    Affects Versions: 1.6.1, 1.5.4
            Reporter: Pawel Bartoszek


When the checkpoint is created for the job which has many operators it could 
happen that Flink uploads too many checkpoint files, at the same time, to S3 
resulting in throttling from S3. 

 
{code:java}
Caused by: org.apache.hadoop.fs.s3a.AWSS3IOException: saving output on 
flink/state-checkpoints/7bbd6495f90257e4bc037ecc08ba21a5/chk-19/4422b088-0836-4f12-bbbe-7e731da11231:
 com.amazonaws.services.s3.model.AmazonS3Exception: Please reduce your request 
rate. (Service: Amazon S3; Status Code: 503; Error Code: SlowDown; Request ID: 
XXXX; S3 Extended Request ID: XXX), S3 Extended Request ID: XXX: Please reduce 
your request rate. (Service: Amazon S3; Status Code: 503; Error Code: SlowDown; 
Request ID: 5310EA750DF8B949; S3 Extended Request ID: XXX)
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:178)
at org.apache.hadoop.fs.s3a.S3AOutputStream.close(S3AOutputStream.java:121)
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:74)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:108)
at 
org.apache.flink.runtime.fs.hdfs.HadoopDataOutputStream.close(HadoopDataOutputStream.java:52)
at 
org.apache.flink.core.fs.ClosingFSDataOutputStream.close(ClosingFSDataOutputStream.java:64)
at 
org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:311){code}
 

Can the upload be retried with kind of back off?

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-10664) Flink: Checkpointing fails with S3 exception - Please reduce your request rate

Reply via email to