Hi Sendoh, Flink should actually never lose data unless it is so late that it arrives after the allowed lateness. This should be independent of the total data size.
The watermarks are indeed global and not bound to a specific input element or a group. So for example if you create the watermarks from the timestamp information of your events and you have the following input event sequence: (eventA, 01-01), (eventB, 02-01), (eventC, 01-02). Then you would generate the watermark W(02-01) after the second event. The third event would then be a late element and if it exceeds the allowed lateness, then it will be discarded. What you have to make sure is that the events in your queue have a monotonically increasing timestamp if you generate the watermarks from a timestamp field of the events. Cheers, Till On Tue, Nov 8, 2016 at 3:37 PM, Sendoh <unicorn.bana...@gmail.com> wrote: > Hi, > > Would the issue be events are too out of ordered and the watermark is > global? > > We want to count event per event type per day, and the data looks like: > > eventA, 10-29-XX > eventB,, 11-02-XX > eventB,, 11-02-XX > eventB,, 11-03-XX > eventB,, 11-04-XX > .... > .... > eventA, 10-29-XX > eventA, 10-30-XX > eventA, 10-30-XX > . > . > . > eventA, 11-04-XX > > > eventA is much much larger than eventB, > and it looks like we lost the count of eventA at 10-29 and 10-30 while we > have count of eventA at 11-04-XX. > Could it be the problem that watermark is gloabal rather than per event? > > Best, > > Sendoh > > > > -- > View this message in context: http://apache-flink-user- > mailing-list-archive.2336050.n4.nabble.com/Cannot-see-all- > events-in-window-apply-for-big-input-tp9945p9985.html > Sent from the Apache Flink User Mailing List archive. mailing list archive > at Nabble.com. >