[ 
https://issues.apache.org/jira/browse/PULSAR-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Priyath Gregory updated PULSAR-7:
---------------------------------
    Issue Type: Improvement  (was: Bug)

> Possibility of data loss due to asynchronous checkpointing in kinesis consumer
> ------------------------------------------------------------------------------
>
>                 Key: PULSAR-7
>                 URL: https://issues.apache.org/jira/browse/PULSAR-7
>             Project: Pulsar
>          Issue Type: Improvement
>            Reporter: Priyath Gregory
>            Priority: Major
>
> The KinesisRecordProcessor pushes each record to a blocking queue and 
> proceeds to checkpoint shard progress on every checkpoint interval. But at 
> the point of checkpointing, there is no guarantee of successful delivery of 
> pending records in the queue, which can lead to data loss in case of instance 
> failure.
> Source: 
> [https://github.com/apache/pulsar/blob/master/pulsar-io/kinesis/src/main/java/org/apache/pulsar/io/kinesis/KinesisRecordProcessor.java]
> The fix should ideally cover the following scenarios as far as I can see:
> 1. Periodic checkpointing at each checkpoint interval should be performed 
> after pending records are processed up until that point
> 2. Checkpoint when onShardEnded is invoked should be performed after pending 
> records are processed.
> 3. Checkpoint when shutdownRequested is invoked should be performed after 
> pending records are processed.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to