Re: Flink Exception - AmazonS3Exception and ExecutionGraph - Error in failover strategy

2018-12-04 Thread Flink Developer
When this happens, it appears that one of the workers fails but the rest of the workers continue to run. How would I be able to configure the app to be able to recover itself completely from the last successful checkpoint when this happens? ‐‐‐ Original Message ‐‐‐ On Monday, December 3,

Flink Exception - AmazonS3Exception and ExecutionGraph - Error in failover strategy

2018-12-03 Thread Flink Developer
I have a Flink app on 1.5.2 which sources data from Kafka topic (400 partitions) and runs with 400 parallelism. The sink uses bucketing sink to S3 with rocks db. Checkpoint interval is 2 min and checkpoint timeout is 2 min. Checkpoint size is a few mb. After execution for a few days, I see: Org