I'm using RocksDB and S3 to for Savepoint. Flink version is 1.4. Currently I'm creating a savepoint for the jobs every 10 mins. When the job starts the data is saved as expected but after a while I see these message in my log and the savepoint is not saved anymore.
2018-06-19 12:23:18,992 [flink-akka.actor.default-dispatcher-19] WARN httpclient.RestStorageService (RestStorageService.java:performRequest(434)) - *Error Response: HEAD '/savepoint/849365e0-ab3c-4a37-9e30-53369a2bbbd2/sp/savepoint-869fa8-7b601cae0b4a' -- ResponseCode: 404*, ResponseStatus: Not Found, Request Headers: [Content-Type: , x-amz-request-payer: requester, x-amz-, User-Agent: AWS-ElasticMapReduce, Host: dp-rill-prod.s3.amazonaws.com], Response Headers: [x-amz-request-id: 9EC59912DD0FE206, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Tue, 19 Jun 2018 12:23:18 GMT, Server: AmazonS3] Each savepoint is saved as a separate folder in S3. The path for which the response is 404, does exists in S3. I read here on the forum <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Exception-when-restoring-state-from-RocksDB-how-to-recover-td9434.html#a9448> that the directories are the "eventual consistent" part of S3. Is the error something that might be happening because of it. What would be a workaround (decrease the savepoint frequency). Has anyone faced a similar kind of issue with RocksDb and S3. Will appreciate any help. Thanks in advance ! -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/