[ 
https://issues.apache.org/jira/browse/FLINK-21427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289234#comment-17289234
 ] 

Roman Khachatryan commented on FLINK-21427:
-------------------------------------------

It's hard to say what's going on without looking at the sink code and the logs.

I'd verify that this checkpoint wasn't already "completed" by the sink. If it 
was, then the object should be removed. 

And would also check if this issue is observed on a local FS (S3 introduced 
strong consistency recently, but I don't know whether it needs to be enabled 
explicitly).

 

Can you explain the connection between serialization and finding a state object 
on S3?

 

Another question: XA version of exactly-once JDBC has already been merged and 
planed to for 1.13 release.

Is there any particular reason to go with a WAL version?

> Recovering from savepoint key does not exist in S3
> --------------------------------------------------
>
>                 Key: FLINK-21427
>                 URL: https://issues.apache.org/jira/browse/FLINK-21427
>             Project: Flink
>          Issue Type: Bug
>    Affects Versions: 1.11.2
>            Reporter: Kenzyme Le
>            Priority: Major
>
> Hi,
> I was able to stop and generate a savepoint successfully, but resuming the 
> job caused repeated errors in the logs *specified key does not exist* in S3.
> Plugin used: *flink-s3-fs-presto-1.11.2.jar*
>  
> {code:java}
> [] - Could not commit checkpoint.
> com.facebook.presto.hive.s3.PrestoS3FileSystem$UnrecoverableS3OperationException:
>  com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does 
> not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; 
> Request ID: D76810E62E37680C; S3 Extended Request ID: 
> h5MRYXOZj5UnPEVjYtMOnUqXUSeJ784eFz3PEQjdT8B7499ZvV+3DHNLII8WLVbVhJ1/ujPG7Bo=),
>  S3 Extended Request ID: 
> h5MRYXOZj5UnPEVjYtMOnUqXUSeJ784eFz3PEQjdT8B7499ZvV+3DHNLII8WLVbVhJ1/ujPG7Bo= 
> (Path: 
> s3p://app/flink/checkpoints/prod/613240ac4a3ebb2e1a428bbd1a973433/taskowned/7a252162-f002-4afc-a45a-04b0a622c204)
>       at 
> com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.lambda$openStream$1(PrestoS3FileSystem.java:917)
>  ~[?:?]
>       at com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:138) ~[?:?]
>       at 
> com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.openStream(PrestoS3FileSystem.java:902)
>  ~[?:?]
>       at 
> com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.openStream(PrestoS3FileSystem.java:887)
>  ~[?:?]
>       at 
> com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.seekStream(PrestoS3FileSystem.java:880)
>  ~[?:?]
>       at 
> com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.lambda$read$0(PrestoS3FileSystem.java:819)
>  ~[?:?]
>       at com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:138) ~[?:?]
>       at 
> com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.read(PrestoS3FileSystem.java:818)
>  ~[?:?]
>       at java.io.BufferedInputStream.fill(BufferedInputStream.java:252) ~[?:?]
>       at java.io.BufferedInputStream.read(BufferedInputStream.java:271) ~[?:?]
>       at java.io.FilterInputStream.read(FilterInputStream.java:83) ~[?:?]
>       at 
> org.apache.flink.fs.s3presto.common.HadoopDataInputStream.read(HadoopDataInputStream.java:84)
>  ~[?:?]
>       at 
> org.apache.flink.core.fs.FSDataInputStreamWrapper.read(FSDataInputStreamWrapper.java:51)
>  ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>       at java.io.DataInputStream.readByte(DataInputStream.java:270) ~[?:?]
>       at 
> org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:435)
>  ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>       at 
> org.apache.flink.runtime.io.disk.InputViewIterator.next(InputViewIterator.java:43)
>  ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>       at 
> org.apache.flink.runtime.util.ReusingMutableToRegularIteratorWrapper.hasNext(ReusingMutableToRegularIteratorWrapper.java:61)
>  ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>       at 
> com.app.streams.kernel.sinks.JdbcBatchWriteAheadSink.sendValues(JdbcBatchWriteAheadSink.java:52)
>  
> ~[blob_p-642a2d12ebc0fdfb4a406ab5e9ebff24a2edf335-29b861b5042a27f11426951b8a753b1f:?]
>       at 
> org.apache.flink.streaming.runtime.operators.GenericWriteAheadSink.notifyCheckpointComplete(GenericWriteAheadSink.java:233)
>  ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper.notifyCheckpointComplete(StreamOperatorWrapper.java:107)
>  ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>       at 
> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.notifyCheckpointComplete(SubtaskCheckpointCoordinatorImpl.java:283)
>  ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.notifyCheckpointComplete(StreamTask.java:987)
>  ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$notifyCheckpointCompleteAsync$10(StreamTask.java:958)
>  ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$notifyCheckpointOperation$12(StreamTask.java:974)
>  ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47)
>  ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>       at 
> org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:78) 
> ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>       at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxExecutorImpl.yield(MailboxExecutorImpl.java:79)
>  ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runSynchronousSavepointMailboxLoop(StreamTask.java:406)
>  ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:881)
>  ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>       at 
> org.apache.flink.streaming.runtime.io.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:113)
>  ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>       at 
> org.apache.flink.streaming.runtime.io.CheckpointBarrierAligner.processBarrier(CheckpointBarrierAligner.java:198)
>  ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>       at 
> org.apache.flink.streaming.runtime.io.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:93)
>  ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>       at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:158)
>  ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>       at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67)
>  ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:351)
>  ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>       at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:191)
>  [flink-dist_2.12-1.11.2.jar:1.11.2]
>       at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181)
>  [flink-dist_2.12-1.11.2.jar:1.11.2]
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:566)
>  [flink-dist_2.12-1.11.2.jar:1.11.2]
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:536)
>  [flink-dist_2.12-1.11.2.jar:1.11.2]
>       at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) 
> [flink-dist_2.12-1.11.2.jar:1.11.2]
>       at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) 
> [flink-dist_2.12-1.11.2.jar:1.11.2]
>       at java.lang.Thread.run(Thread.java:834) [?:?]
> Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The specified 
> key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: 
> NoSuchKey; Request ID: D76810E62E37680C; S3 Extended Request ID: 
> h5MRYXOZj5UnPEVjYtMOnUqXUSeJ784eFz3PEQjdT8B7499ZvV+3DHNLII8WLVbVhJ1/ujPG7Bo=)
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1799)
>  ~[?:?]
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1383)
>  ~[?:?]
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1359)
>  ~[?:?]
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1139)
>  ~[?:?]
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:796)
>  ~[?:?]
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:764)
>  ~[?:?]
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:738)
>  ~[?:?]
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:698)
>  ~[?:?]
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:680)
>  ~[?:?]
>       at 
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:544) ~[?:?]
>       at 
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:524) ~[?:?]
>       at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5054) 
> ~[?:?]
>       at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5000) 
> ~[?:?]
>       at 
> com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1486) 
> ~[?:?]
>       at 
> com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.lambda$openStream$1(PrestoS3FileSystem.java:905)
>  ~[?:?]
>       ... 41 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to