Flinkheads,

I'm processing from a Kafka source, using event time with watermarks based
on a threshold, and using tumbling time windows to perform some rollup. My
sink is idempotent, and I want to ensure exactly-once processing end-to-end.

I am trying to figure out if I can stick with memory checkpointing, and not
bother with a checkpointing state backend or savepoints for job redeploys.
It'd be great if I can just rely on the Kafka consumer's offset persistence
to Zookeeper for that 'group.id' - I see that it saves the relevant offset
to Zookeeper when a checkpoint has been triggered.

However I'm concerned whether there is potential for dropping events if I
stick with the memory checkpoints.

The documentation talks of a checkpoint being triggered when the relevant
barrier has made it all the way to the sink -- how does that interact with
windowed streams, where some events might get buffered while later ones make
it through?

More concretely, when the Kafka consumer persists an offset to Zookeeper
based on receiving a checkpoint trigger, can I trust that all events from
before that offset are not held in any windowing intermediate state? 

Thanks!

Shikhar



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Checkpoints-and-event-ordering-tp4664.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.

Reply via email to