Hi, I have a use case where we are reading events from kinesis stream.The event can look like this Event { event_id, transaction_id key1, key2, value, timestamp, some other fields... }
We want to aggregate the values per key for all events we have seen till now (as simple as "select key1, key2, sum(value) from events group by key1, key2key."). For this I have created a simple flink job which uses flink-kinesis connector and applies keyby() and sum() on the incoming events. I am facing two challenges here: 1) The incoming events can have duplicates. How to maintain exactly once processing here, as processing duplicate events can give me erroneous result? The field transaction_id will be unique for each events. If two events have same transaction_id, we can say that they are duplicates (By duplicates here, I don't just mean the retried ones. The same message can be present in kinesis with different sequence number. I am not sure if flink-kinesis connector can handle that, as it is using KCL underlying which I assume doesn't take care of it) 2) There can be the the cases where the value has been updated for a key after processing the event and we may want to reprocess those events with new value. Since this is just a value change, the transaction_id will be same. The idempotency logic will not allow to reprocess the events. What are the ways to handle such scenarios in flink? Thanks Pooja -- Warm Regards, Pooja Agrawal