[ https://issues.apache.org/jira/browse/FLINK-36239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hong Liang Teoh resolved FLINK-36239. ------------------------------------- Resolution: Fixed merged commit [{{876892d}}|https://github.com/apache/flink-connector-aws/commit/876892d492dd560ccb00e61d8af66e5bc4eacbb3] into apache:main > DDB Streams Connector reprocessing due to DescribeStream inconsistencies for > trimmed shards > ------------------------------------------------------------------------------------------- > > Key: FLINK-36239 > URL: https://issues.apache.org/jira/browse/FLINK-36239 > Project: Flink > Issue Type: Bug > Components: Connectors / DynamoDB > Reporter: Abhi Gupta > Priority: Major > Labels: pull-request-available > > *Problem* > We can have reprocessing of events when DDBStream shards are deleted by DDB > after 24 hours. > *Root cause* > We use DDB DescribeStream API to retrieve a list of shards to consume from. > This API is eventually consistent when it comes to deleting expired shards, > so some responses will include it, and some will not. > On DDBStreams connector, shards have the following lifecycle. > # {*}Discovery{*}: Shard discovered (known) > # {*}Assign{*}: Shard assigned (assigned) > # {*}Finished{*}: Once shard is finished, it will be moved to finished > (finished) > # *Cleanup:* Once shard is finished, DescribeStream doesn't return the > shardId, and >24h has progressed, we will delete the shard from state (to > prevent accumulating unnecessary state). > *Example:* > * We did describestream and processed shard-a >24 hours ago > * Now the shard has been removed since its more than 24 hours. > * We just got a describestream call for this shard. > * Describestream didn’t give this shard > * We got another describestream call. Describestream somehow sent that shard > back due to inconsistencies, we sent this out to SplitTracker. This shard was > not in finished shards, since we just deleted it some time back since it was > more than 25 hours old, so this shard would be processed duplicately. > *Fix* > We'll make the shard retention to be 48 hours to avoid these edge cases -- This message was sent by Atlassian Jira (v8.20.10#820010)