Hi,
We are using Apache Flink 1.19

So we are keeping a hash state in our keyed process function.
We have TTL set for this state, so entries are auto evicted after TTL
expires.

We store a record's hash in the state, so if the same record arrives again,
it won't be sent downstream.

Now say before checkpoint occurs, and the task manager restarts, the
upstream is going to send the same records again downstream, as they have
not been checkpointed yet.

In that case it is quite possible that the same record is now sent to a
different task manager, where the previous hash state for that key does not
exist and the same record is now processed again and sent downstream.

Is this a likely scenario, the way I have described it?

If that is the case how do we ensure we guarantee de-duplication?

Thanks
Sachin

Reply via email to