Hi, We are using Apache Flink 1.19 So we are keeping a hash state in our keyed process function. We have TTL set for this state, so entries are auto evicted after TTL expires.
We store a record's hash in the state, so if the same record arrives again, it won't be sent downstream. Now say before checkpoint occurs, and the task manager restarts, the upstream is going to send the same records again downstream, as they have not been checkpointed yet. In that case it is quite possible that the same record is now sent to a different task manager, where the previous hash state for that key does not exist and the same record is now processed again and sent downstream. Is this a likely scenario, the way I have described it? If that is the case how do we ensure we guarantee de-duplication? Thanks Sachin
