[ https://issues.apache.org/jira/browse/PULSAR-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Priyath Gregory updated PULSAR-7: --------------------------------- Issue Type: Improvement (was: Bug) > Possibility of data loss due to asynchronous checkpointing in kinesis consumer > ------------------------------------------------------------------------------ > > Key: PULSAR-7 > URL: https://issues.apache.org/jira/browse/PULSAR-7 > Project: Pulsar > Issue Type: Improvement > Reporter: Priyath Gregory > Priority: Major > > The KinesisRecordProcessor pushes each record to a blocking queue and > proceeds to checkpoint shard progress on every checkpoint interval. But at > the point of checkpointing, there is no guarantee of successful delivery of > pending records in the queue, which can lead to data loss in case of instance > failure. > Source: > [https://github.com/apache/pulsar/blob/master/pulsar-io/kinesis/src/main/java/org/apache/pulsar/io/kinesis/KinesisRecordProcessor.java] > The fix should ideally cover the following scenarios as far as I can see: > 1. Periodic checkpointing at each checkpoint interval should be performed > after pending records are processed up until that point > 2. Checkpoint when onShardEnded is invoked should be performed after pending > records are processed. > 3. Checkpoint when shutdownRequested is invoked should be performed after > pending records are processed. > -- This message was sent by Atlassian Jira (v8.3.4#803005)