[ 
https://issues.apache.org/jira/browse/FLINK-36239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh resolved FLINK-36239.
-------------------------------------
    Resolution: Fixed

 merged commit 
[{{876892d}}|https://github.com/apache/flink-connector-aws/commit/876892d492dd560ccb00e61d8af66e5bc4eacbb3]
 into   apache:main

> DDB Streams Connector reprocessing due to DescribeStream inconsistencies for 
> trimmed shards
> -------------------------------------------------------------------------------------------
>
>                 Key: FLINK-36239
>                 URL: https://issues.apache.org/jira/browse/FLINK-36239
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / DynamoDB
>            Reporter: Abhi Gupta
>            Priority: Major
>              Labels: pull-request-available
>
> *Problem*
> We can have reprocessing of events when DDBStream shards are deleted by DDB 
> after 24 hours. 
> *Root cause*
> We use DDB DescribeStream API to retrieve a list of shards to consume from. 
> This API is eventually consistent when it comes to deleting expired shards, 
> so some responses will include it, and some will not.
> On DDBStreams connector, shards have the following lifecycle.
>  # {*}Discovery{*}: Shard discovered (known)
>  # {*}Assign{*}: Shard assigned (assigned)
>  # {*}Finished{*}: Once shard is finished, it will be moved to finished 
> (finished)
>  # *Cleanup:* Once shard is finished, DescribeStream doesn't return the 
> shardId, and >24h has progressed, we will delete the shard from state (to 
> prevent accumulating unnecessary state).
> *Example:*
>  * We did describestream and processed shard-a >24 hours ago
>  * Now the shard has been removed since its more than 24 hours.
>  * We just got a describestream call for this shard.
>  * Describestream didn’t give this shard
>  * We got another describestream call. Describestream somehow sent that shard 
> back due to inconsistencies, we sent this out to SplitTracker. This shard was 
> not in finished shards, since we just deleted it some time back since it was 
> more than 25 hours old, so this shard would be processed duplicately.
> *Fix*
> We'll make the shard retention to be 48 hours to avoid these edge cases



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to