Hi Ying and Danny, Sorry for the late reply, I just got back from vacation.
Yes I'm running Flink in Kinesis Data Analytics with Flink 1.8, and checkpoint is enabled. This fully managed solution limits my access to Flink logs, so far I didn't get any logs related to throttle or fail over. The reason why I suspect throttle is the root cause is because some AWS lambda that connects to the same DynamoDB stream has higher throttle right after Flink starts consuming the DynamoDB stream, in this case I believe the throttle will also happen on Flink side. I'm actively working with AWS support to try to find some logs on this. At the same time, when you say 'in theory should not lose exactly-once semantics', does that mean Flink will retry when throttle? I notice there is a parameter "flink.shard.getrecords.maxretries" and it's default value is 3. Will Flink skip this record when all retry attempts failed? Thanks, Jiawei On Tue, Sep 15, 2020 at 4:38 PM Cranmer, Danny <cranm...@amazon.com> wrote: > Hi Jiawei, > > > > I agree that the offset management mechanism uses the same code as Kinesis > Stream Consumer and in theory should not lose exactly-once semantics. As > Ying is alluding to, if your application is restarted and you have > snapshotting disabled in AWS there is a chance that records can be lost > between runs. However, if you have snapshotting enabled then the > application should continue consuming records from the last processed > sequence number. > > > > I am happy to take a deeper look if you can provide more > information/logs/code. > > > > Thanks, > > > > *From: *Ying Xu <y...@lyft.com> > *Date: *Monday, 14 September 2020 at 19:48 > *To: *Andrey Zagrebin <azagre...@apache.org> > *Cc: *Jiawei Wu <wujiawei5837...@gmail.com>, user <user@flink.apache.org> > *Subject: *RE: [EXTERNAL] Flink DynamoDB stream connector losing records > > > > *CAUTION*: This email originated from outside of the organization. Do not > click links or open attachments unless you can confirm the sender and know > the content is safe. > > > > Hi Jiawei: > > > > Sorry for the delayed reply. When you mention certain records getting > skipped, is it from the same run or across different runs. Any more > specific details on how/when records are lost? > > > > FlinkDynamoDBStreamsConsumer is built on top of FlinkKinesisConsumer , > with similar offset management mechanism. In theory it shouldn't lose > exactly-once semantics in the case of getting throttled. We haven't run it > in any AWS kinesis analytics environment though. > > > > Thanks. > > > > > > On Thu, Sep 10, 2020 at 7:51 AM Andrey Zagrebin <azagre...@apache.org> > wrote: > > Generally speaking this should not be a problem for exactly-once but I am > not familiar with the DynamoDB and its Flink connector. > > Did you observe any failover in Flink logs? > > > > On Thu, Sep 10, 2020 at 4:34 PM Jiawei Wu <wujiawei5837...@gmail.com> > wrote: > > And I suspect I have throttled by DynamoDB stream, I contacted AWS support > but got no response except for increasing WCU and RCU. > > > > Is it possible that Flink will lose exactly-once semantics when throttled? > > > > On Thu, Sep 10, 2020 at 10:31 PM Jiawei Wu <wujiawei5837...@gmail.com> > wrote: > > Hi Andrey, > > > > Thanks for your suggestion, but I'm using Kinesis analytics application > which supports only Flink 1.8.... > > > > Regards, > > Jiawei > > > > On Thu, Sep 10, 2020 at 10:13 PM Andrey Zagrebin <azagre...@apache.org> > wrote: > > Hi Jiawei, > > > > Could you try Flink latest release 1.11? > 1.8 will probably not get bugfix releases. > > I will cc Ying Xu who might have a better idea about the DinamoDB source. > > > > Best, > > Andrey > > > > On Thu, Sep 10, 2020 at 3:10 PM Jiawei Wu <wujiawei5837...@gmail.com> > wrote: > > Hi, > > > > I'm using AWS kinesis analytics application with Flink 1.8. I am using > the FlinkDynamoDBStreamsConsumer to consume DynamoDB stream records. But > recently I found my internal state is wrong. > > > > After I printed some logs I found some DynamoDB stream record are skipped > and not consumed by Flink. May I know if someone encountered the same issue > before? Or is it a known issue in Flink 1.8? > > > > Thanks, > > Jiawei > >