Hi there, I understand Flink currently doesn't support handling late arriving events. In reality, a exact-once link job needs to deal data missing or backfill from time to time without rewind to previous save point, which implies restored job suffers blackout while it tried to catch up.
In general, I don't know if there are good solutions for all of these scenarios. Some might keep messages in windows longer.(messages not purged yet) Some might kick off another pipeline just dealing with affected windows(messages already purged). What would be suggested patterns? For keeping message longer approach, we need to corrdinate backfill messages and current on going messages assigned to windows not firing without all of these messages. One challenge of this approach would be determine when backfill messages all processed. Ideally there would be a customized barrier that travel through entire topology and tell windows backfills are done. This works both for non keyed stream and keyed stream. I don't think link support this yet. Another way would be use session window merge and extent window purging time with some reasonable estimation. This approach is based on estimation and may add execution latency to those windows. Which would be suggested way in general? Thanks, Chen