Flinkheads, I'm processing from a Kafka source, using event time with watermarks based on a threshold, and using tumbling time windows to perform some rollup. My sink is idempotent, and I want to ensure exactly-once processing end-to-end.
I am trying to figure out if I can stick with memory checkpointing, and not bother with a checkpointing state backend or savepoints for job redeploys. It'd be great if I can just rely on the Kafka consumer's offset persistence to Zookeeper for that 'group.id' - I see that it saves the relevant offset to Zookeeper when a checkpoint has been triggered. However I'm concerned whether there is potential for dropping events if I stick with the memory checkpoints. The documentation talks of a checkpoint being triggered when the relevant barrier has made it all the way to the sink -- how does that interact with windowed streams, where some events might get buffered while later ones make it through? More concretely, when the Kafka consumer persists an offset to Zookeeper based on receiving a checkpoint trigger, can I trust that all events from before that offset are not held in any windowing intermediate state? Thanks! Shikhar -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Checkpoints-and-event-ordering-tp4664.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.