[ https://issues.apache.org/jira/browse/FLINK-4500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16248266#comment-16248266 ]
ASF GitHub Bot commented on FLINK-4500: --------------------------------------- Github user mcfongtw commented on the issue: https://github.com/apache/flink/pull/4605 Hi, @zentol , thanks for reviewing this PR. I recalled that I put a [caveat ](https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/connectors/cassandra.html#checkpointing-and-fault-tolerance)about this potential data loss in the latest C* connector documents. Since this fix is committed, would you like me to open another PR just to remove that warning message from document? > Cassandra sink can lose messages > -------------------------------- > > Key: FLINK-4500 > URL: https://issues.apache.org/jira/browse/FLINK-4500 > Project: Flink > Issue Type: Bug > Components: Cassandra Connector > Affects Versions: 1.1.0 > Reporter: Elias Levy > Assignee: Michael Fong > Fix For: 1.4.0 > > > The problem is the same as I pointed out with the Kafka producer sink > (FLINK-4027). The CassandraTupleSink's send() and CassandraPojoSink's send() > both send data asynchronously to Cassandra and record whether an error occurs > via a future callback. But CassandraSinkBase does not implement > Checkpointed, so it can't stop checkpoint from happening even though the are > Cassandra queries in flight from the checkpoint that may fail. If they do > fail, they would subsequently not be replayed when the job recovered, and > would thus be lost. > In addition, > CassandraSinkBase's close should check whether there is a pending exception > and throw it, rather than silently close. It should also wait for any > pending async queries to complete and check their status before closing. -- This message was sent by Atlassian JIRA (v6.4.14#64029)