Hi there,

I understand Flink currently doesn't support handling late arriving events.
In reality, a exact-once link job needs to deal data missing or backfill
from time to time without rewind to previous save point, which implies
restored job suffers blackout while it tried to catch up.

In general, I don't know if there are good solutions for all of these
scenarios. Some might keep messages in windows longer.(messages not purged
yet) Some might kick off another pipeline just dealing with affected
windows(messages already purged). What would be suggested patterns?

For keeping message longer approach, we need to corrdinate backfill
messages and current on going messages assigned to windows not firing
without all of these messages. One challenge of this approach would be
determine when backfill messages all processed. Ideally there would be a
customized barrier that travel through entire topology and tell windows
backfills are done. This works both for non keyed stream and keyed stream.
I don't think link support this yet. Another way would be use session
window merge and extent window purging time with some reasonable
estimation. This approach is based on estimation and may add execution
latency to those windows.

Which would be suggested way in general?

Thanks,
Chen

Reply via email to