Is Flink processing really repeatedly deterministic when incoming stream of elements is out-of-order? How is it ensured?
I am aware of all the principles like event time and watermarking. But I can't understand how it works in case there are late elements in stream - that means there are elements violating the watermark condition - having lower timestamp than previously emitted watermark. From my point of view these elements will flow through the system without any mechanism that would discard them. Late elements then may, or may not, fall into existing windows. Let's draw simple example reading from Kafka source with two partitions. Numbers are representing event time. Data from Kafka are shuffled to the one WindowOperator calculating sum (all elements have the same key). ---------------------------------------| part. 1 | ..., 15, 12, 9 | ------->| | | WindowOperator | part. 2 | ..., 18, 6, 11 | ------->| (window maxTimestamp 10) | ----------------------------------------- Elements can arrive to WindowOperator in arbitrary order example1 (E denotes element, W denotes watermark) E 9 W 9 E 11 W 11 (current watermark: min(9, 11) = 9) E 6 E 12 W 12 (current watermark: min(9, 12) = 12) --------------- window(0, 10) fires with sum 15 example2: E 9 W 9 E 11 W 11 (current watermark: min(9, 11) = 9) E 12 W 12 (current watermark: min(9, 12) = 12) --------------- window fires with sum 9 In my example result is not deterministic, it's more less random. Is there anything I am missing? Thank you very much for explanation. -- View this message in context: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Deterministic-processing-with-out-of-order-streams-tp14409.html Sent from the Apache Flink Mailing List archive. mailing list archive at Nabble.com.