Hi Jiawei, I agree that the offset management mechanism uses the same code as Kinesis Stream Consumer and in theory should not lose exactly-once semantics. As Ying is alluding to, if your application is restarted and you have snapshotting disabled in AWS there is a chance that records can be lost between runs. However, if you have snapshotting enabled then the application should continue consuming records from the last processed sequence number.
I am happy to take a deeper look if you can provide more information/logs/code. Thanks, From: Ying Xu <y...@lyft.com> Date: Monday, 14 September 2020 at 19:48 To: Andrey Zagrebin <azagre...@apache.org> Cc: Jiawei Wu <wujiawei5837...@gmail.com>, user <user@flink.apache.org> Subject: RE: [EXTERNAL] Flink DynamoDB stream connector losing records CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi Jiawei: Sorry for the delayed reply. When you mention certain records getting skipped, is it from the same run or across different runs. Any more specific details on how/when records are lost? FlinkDynamoDBStreamsConsumer is built on top of FlinkKinesisConsumer , with similar offset management mechanism. In theory it shouldn't lose exactly-once semantics in the case of getting throttled. We haven't run it in any AWS kinesis analytics environment though. Thanks. On Thu, Sep 10, 2020 at 7:51 AM Andrey Zagrebin <azagre...@apache.org<mailto:azagre...@apache.org>> wrote: Generally speaking this should not be a problem for exactly-once but I am not familiar with the DynamoDB and its Flink connector. Did you observe any failover in Flink logs? On Thu, Sep 10, 2020 at 4:34 PM Jiawei Wu <wujiawei5837...@gmail.com<mailto:wujiawei5837...@gmail.com>> wrote: And I suspect I have throttled by DynamoDB stream, I contacted AWS support but got no response except for increasing WCU and RCU. Is it possible that Flink will lose exactly-once semantics when throttled? On Thu, Sep 10, 2020 at 10:31 PM Jiawei Wu <wujiawei5837...@gmail.com<mailto:wujiawei5837...@gmail.com>> wrote: Hi Andrey, Thanks for your suggestion, but I'm using Kinesis analytics application which supports only Flink 1.8.... Regards, Jiawei On Thu, Sep 10, 2020 at 10:13 PM Andrey Zagrebin <azagre...@apache.org<mailto:azagre...@apache.org>> wrote: Hi Jiawei, Could you try Flink latest release 1.11? 1.8 will probably not get bugfix releases. I will cc Ying Xu who might have a better idea about the DinamoDB source. Best, Andrey On Thu, Sep 10, 2020 at 3:10 PM Jiawei Wu <wujiawei5837...@gmail.com<mailto:wujiawei5837...@gmail.com>> wrote: Hi, I'm using AWS kinesis analytics application with Flink 1.8. I am using the FlinkDynamoDBStreamsConsumer to consume DynamoDB stream records. But recently I found my internal state is wrong. After I printed some logs I found some DynamoDB stream record are skipped and not consumed by Flink. May I know if someone encountered the same issue before? Or is it a known issue in Flink 1.8? Thanks, Jiawei