Priyath Gregory created PULSAR-7: ------------------------------------ Summary: Possibility of data loss due to asynchronous checkpointing in kinesis consumer Key: PULSAR-7 URL: https://issues.apache.org/jira/browse/PULSAR-7 Project: Pulsar Issue Type: Bug Reporter: Priyath Gregory
The KinesisRecordProcessor pushes each record to a blocking queue and proceeds to checkpoint shard progress on every checkpoint interval. But at the point of checkpointing, there is no guarantee of successful delivery of pending records in the queue, which can lead to data loss in case of instance failure. Source: [https://github.com/apache/pulsar/blob/master/pulsar-io/kinesis/src/main/java/org/apache/pulsar/io/kinesis/KinesisRecordProcessor.java] The fix should ideally cover the following scenarios as far as I can see: 1. Periodic checkpointing at each checkpoint interval should be performed after pending records are processed up until that point 2. Checkpoint when onShardEnded is invoked should be performed after pending records are processed. 3. Checkpoint when shutdownRequested is invoked should be performed after pending records are processed. -- This message was sent by Atlassian Jira (v8.3.4#803005)