[ https://issues.apache.org/jira/browse/FLINK-37627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17941693#comment-17941693 ]
Arun Lakshman edited comment on FLINK-37627 at 4/7/25 7:19 PM: --------------------------------------------------------------- Can you please assign this issue to me. I can work on this issue was (Author: arunlakshman): Can you please this to me. I can work on this issue > Restarting from a checkpoint/savepoint which coincides with shard split > causes data loss > ---------------------------------------------------------------------------------------- > > Key: FLINK-37627 > URL: https://issues.apache.org/jira/browse/FLINK-37627 > Project: Flink > Issue Type: Bug > Components: Connectors / Kinesis > Affects Versions: aws-connector-5.0.0 > Reporter: Keith Lee > Priority: Major > > Similar to DDB stream connector's issue > https://issues.apache.org/jira/browse/FLINK-37416 > This is less likely to happen on Kinesis connector due to much lower > frequency of re-sharding / assigning new split but technically possible so > we'd like to fix this to avoid data > loss. > The scenario is as follow: > - A checkpoint started > - KinesisStreamsSourceEnumerator takes a checkpoint (shard was assigned here) > - KinesisStreamsSourceEnumerator sends checkpoint event to reader > - Before taking reader checkpoint, a SplitFinishedEvent came up in reader > - Reader takes checkpoint > - Now, just after checkpoint complete, job restarted > This can lead to a shard lineage getting lost because of a shard being in > ASSIGNED state in enumerator and not being part of any task manager state. > See DDB Connector issue's PR for reference fix: > https://issues.apache.org/jira/browse/FLINK-37416 -- This message was sent by Atlassian Jira (v8.20.10#820010)