Re: Cannot see all events in window apply() for big input

2016-11-10 Thread Sendoh
Hi, We let watermark proceed at the earliest timestamp among all event types. Our test result looks correct. /* * Watermark proceeds at the earliest timestamp among all the event types * */ public class EventsWatermark implements AssignerWithPeriodicWatermarks> { private final long maxTimeLa

Re: Cannot see all events in window apply() for big input

2016-11-08 Thread Till Rohrmann
Flink does not support per key watermarks or type sensitive watermarks. The underlying assumption is that you have a global watermark which defines the progress wrt to event time in your topology. The easiest way would be to have an input which has a monotonically increasing timestamp. Alternative

Re: Cannot see all events in window apply() for big input

2016-11-08 Thread Sendoh
Thank you for confirming. What would you think an efficient way not having global watermark? The following logic fails to build Watermark per KeyStream: jsonStreams.keyBy(new JsonKeySelector()).assignTimestampsAndWatermarks(new JsonWatermark()).keyBy(JsonKeySelector()).window( So, using split

Re: Cannot see all events in window apply() for big input

2016-11-08 Thread Till Rohrmann
Hi Sendoh, Flink should actually never lose data unless it is so late that it arrives after the allowed lateness. This should be independent of the total data size. The watermarks are indeed global and not bound to a specific input element or a group. So for example if you create the watermarks f

Re: Cannot see all events in window apply() for big input

2016-11-08 Thread Sendoh
Hi, Would the issue be events are too out of ordered and the watermark is global? We want to count event per event type per day, and the data looks like: eventA, 10-29-XX eventB,, 11-02-XX eventB,, 11-02-XX eventB,, 11-03-XX eventB,, 11-04-XX eventA, 10-29-XX eventA, 10-30-XX eventA, 1

Re: Cannot see all events in window apply() for big input

2016-11-08 Thread Sendoh
Yes. the other job performs event time window and we tried 1.2-SNAPSHOT and 1.1.3. The old version 1.0.3 we lost much much less data. We tried both windowAll() and keyBy() window() already, and tried very tiny lag and window(1 millisecond). My doubt comes from smaller input works while bigger inpu

Re: Cannot see all events in window apply() for big input

2016-11-07 Thread Till Rohrmann
And this other job also performs a window operation based on event time? What do you mean with “I have a doubt is the necessary parallelism for window operation if reprocessing a skew input from Kafka”? Also be aware that the windowAll operation is executed with a dop of 1, making it effectively

Re: Cannot see all events in window apply() for big input

2016-11-07 Thread Sendoh
Hi Till. Thank you for suggesting. We know the timestamp is correct because another Flink job is running with the three topics correctly. We also know the operators work well before window apply() because we check the result before window apply(). What currently I have a doubt is the necessary pa

Re: Cannot see all events in window apply() for big input

2016-11-07 Thread Till Rohrmann
Hi Sendoh, from your description it's really hard to figure out what the problem could be. The first thing to do would be check how many records you actually consume from Kafka and how many items are outputted. Next I would take a look at the timestamp extractor. Can it be that records are discard