[ https://issues.apache.org/jira/browse/FLINK-7637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16216206#comment-16216206 ]
ASF GitHub Bot commented on FLINK-7637: --------------------------------------- Github user bowenli86 commented on a diff in the pull request: https://github.com/apache/flink/pull/4871#discussion_r146440154 --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java --- @@ -265,19 +259,86 @@ public void close() throws Exception { if (kp != null) { LOG.info("Flushing outstanding {} records", kp.getOutstandingRecordsCount()); // try to flush all outstanding records - while (kp.getOutstandingRecordsCount() > 0) { - kp.flush(); - try { - Thread.sleep(500); - } catch (InterruptedException e) { - LOG.warn("Flushing was interrupted."); - // stop the blocking flushing and destroy producer immediately - break; - } - } + flushSync(kp); + LOG.info("Flushing done. Destroying producer instance."); kp.destroy(); } + + // make sure we propagate pending errors + checkAndPropagateAsyncError(); } + @Override + public void initializeState(FunctionInitializationContext context) throws Exception { + // nothing to do + } + + @Override + public void snapshotState(FunctionSnapshotContext context) throws Exception { + // check for asynchronous errors and fail the checkpoint if necessary + checkAndPropagateAsyncError(); + + flushSync(producer); + if (producer.getOutstandingRecordsCount() > 0) { --- End diff -- what if records are added by another thread between the calls of `flushSync()` and `producer.getOutstandingRecordsCount()`? > FlinkKinesisProducer violates at-least-once guarantees > ------------------------------------------------------ > > Key: FLINK-7637 > URL: https://issues.apache.org/jira/browse/FLINK-7637 > Project: Flink > Issue Type: Bug > Components: Kinesis Connector > Reporter: Tzu-Li (Gordon) Tai > Assignee: Tzu-Li (Gordon) Tai > Priority: Blocker > Fix For: 1.4.0, 1.3.3 > > > Currently, there is no flushing of KPL outstanding records on checkpoints in > the {{FlinkKinesisProducer}}. Likewise to the at-least-once issue on the > Flink Kafka producer before, this may lead to data loss if there are > asynchronous failing records after a checkpoint which the records was part of > was completed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)