Thanks David for your detailed answers. Mans On Wednesday, December 11, 2019, 08:12:51 AM EST, David Anderson <da...@ververica.com> wrote: If we have allowed lateness to be greater than 0 (say 5), then if an event which arrives at window end + 3 (within allowed lateness),
(a) it is considered late and included in the window function as a late firing ? An event with a timestamp that falls within the window's boundaries that arrives when the current watermark is at window end + 3 will be included as a late event that has arrived within the allowed lateness. Actually, I'm not sure I got this right -- on this point I recommend some experimentation, or careful reading of the code. On Wed, Dec 11, 2019 at 2:08 PM David Anderson <da...@ververica.com> wrote: I'll attempt to answer your questions. If we have allowed lateness to be greater than 0 (say 5), then if an event which arrives at window end + 3 (within allowed lateness), (a) it is considered late and included in the window function as a late firing ? An event with a timestamp that falls within the window's boundaries that arrives when the current watermark is at window end + 3 will be included as a late event that has arrived within the allowed lateness. (b) Are the late firings under the control of the trigger ? Yes, the trigger is involved in all firings, late or not. (c) If there are may events like this - are there multiple window function invocations ? With the default event time trigger, each late event causes a late firing. You could use a custom trigger to implement other behaviors. (d) Are these events (still within window end + allowed lateness) also emitted via the side output late data ? No. The side output for late events is only used to collect events that fall outside the allowed lateness. 2. If an event arrives after the window end + allowed lateness - (a) Is it excluded from the window function but still emitted from the side output late data ? Yes. (b) And if it is emitted is there any attribute which indicates for which window it was a late event ? No, the event is emitted without any additional information. (c) Is there any time limit while the late side output remains active for a particular window or all late events channeled to it ? There is no time limit; the late side output remains operative indefinitely. Hope that helps,David On Wed, Dec 11, 2019 at 1:40 PM M Singh <mans2si...@yahoo.com> wrote: Thanks Timo for your answer. I will try the prototype but was wondering if I can find some theoretical documentation to give me a sound understanding. Mans On Wednesday, December 11, 2019, 05:44:15 AM EST, Timo Walther <twal...@apache.org> wrote: Little mistake: The key must be any constant instead of `e`. On 11.12.19 11:42, Timo Walther wrote: > Hi Mans, > > I would recommend to create a little prototype to answer most of your > questions in action. > > You can simple do: > > stream = env.fromElements(1L, 2L, 3L, 4L) > .assignTimestampsAndWatermarks( > new AssignerWithPunctuatedWatermarks{ > extractTimestamp(e) = e, > checkAndGetNextWatermark(e, ts) = new Watermark(e) > }) > > stream.keyBy(e -> e).window(...).print() > env.execute() > > This allows to quickly create a stream of event time for testing the > semantics. > > I hope this helps. Otherwise of course we can help you in finding the > answers to the remaining questions. > > Regards, > Timo > > > > On 10.12.19 20:32, M Singh wrote: >> Hi: >> >> I have a few questions about the side output late data. >> >> Here is the API >> >> |stream .keyBy(...) <- keyed versus non-keyed windows .window(...) <- >> required: "assigner" [.trigger(...)] <- optional: "trigger" (else >> default trigger) [.evictor(...)] <- optional: "evictor" (else no >> evictor) [.allowedLateness(...)] <- optional: "lateness" (else zero) >> [.sideOutputLateData(...)] <- optional: "output tag" (else no side >> output for late data) .reduce/aggregate/fold/apply() <- required: >> "function" [.getSideOutput(...)] <- optional: "output tag"| >> >> >> >> Apache Flink 1.9 Documentation: Windows >> <https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/stream/operators/windows.html#late-elements-considerations> >> >> >> >> >> >> >> Apache Flink 1.9 Documentation: Windows >> >> <https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/stream/operators/windows.html#late-elements-considerations> >> >> >> >> >> Here is the documentation: >> >> >> Late elements >> >> considerations<https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/stream/operators/windows.html#late-elements-considerations> >> >> >> >> When specifying an allowed lateness greater than 0, the window along >> with its content is kept after the watermark passes the end of the >> window. In these cases, when a late but not dropped element arrives, >> it could trigger another firing for the window. These firings are >> called |late firings|, as they are triggered by late events and in >> contrast to the |main firing| which is the first firing of the window. >> In case of session windows, late firings can further lead to merging >> of windows, as they may “bridge” the gap between two pre-existing, >> unmerged windows. >> >> Attention You should be aware that the elements emitted by a late >> firing should be treated as updated results of a previous computation, >> i.e., your data stream will contain multiple results for the same >> computation. Depending on your application, you need to take these >> duplicated results into account or deduplicate them. >> >> >> Questions: >> >> 1. If we have allowed lateness to be greater than 0 (say 5), then if >> an event which arrives at window end + 3 (within allowed lateness), >> (a) it is considered late and included in the window function as >> a late firing ? >> (b) Are the late firings under the control of the trigger ? >> (c) If there are may events like this - are there multiple window >> function invocations ? >> (d) Are these events (still within window end + allowed lateness) >> also emitted via the side output late data ? >> 2. If an event arrives after the window end + allowed lateness - >> (a) Is it excluded from the window function but still emitted >> from the side output late data ? >> (b) And if it is emitted is there any attribute which indicates >> for which window it was a late event ? >> (c) Is there any time limit while the late side output remains >> active for a particular window or all late events channeled to it ? >> >> Thanks >> >> Thanks >> >> Mans >> >> >> >> >>