Hi,
We let watermark proceed at the earliest timestamp among all event types.
Our test result looks correct.
/*
* Watermark proceeds at the earliest timestamp among all the event types
* */
public class EventsWatermark implements
AssignerWithPeriodicWatermarks> {
private final long maxTimeLa
Flink does not support per key watermarks or type sensitive watermarks. The
underlying assumption is that you have a global watermark which defines the
progress wrt to event time in your topology.
The easiest way would be to have an input which has a monotonically
increasing timestamp. Alternative
Thank you for confirming.
What would you think an efficient way not having global watermark? The
following logic fails to build Watermark per KeyStream:
jsonStreams.keyBy(new JsonKeySelector()).assignTimestampsAndWatermarks(new
JsonWatermark()).keyBy(JsonKeySelector()).window(
So, using split
Hi Sendoh,
Flink should actually never lose data unless it is so late that it arrives
after the allowed lateness. This should be independent of the total data
size.
The watermarks are indeed global and not bound to a specific input element
or a group. So for example if you create the watermarks f
Hi,
Would the issue be events are too out of ordered and the watermark is
global?
We want to count event per event type per day, and the data looks like:
eventA, 10-29-XX
eventB,, 11-02-XX
eventB,, 11-02-XX
eventB,, 11-03-XX
eventB,, 11-04-XX
eventA, 10-29-XX
eventA, 10-30-XX
eventA, 1
Yes. the other job performs event time window and we tried 1.2-SNAPSHOT and
1.1.3. The old version 1.0.3 we lost much much less data. We tried both
windowAll() and keyBy() window() already, and tried very tiny lag and
window(1 millisecond).
My doubt comes from smaller input works while bigger inpu
And this other job also performs a window operation based on event time?
What do you mean with “I have a doubt is the necessary parallelism for
window operation if reprocessing a skew input from Kafka”?
Also be aware that the windowAll operation is executed with a dop of 1,
making it effectively
Hi Till.
Thank you for suggesting. We know the timestamp is correct because another
Flink job is running with the three topics correctly. We also know the
operators work well before window apply() because we check the result before
window apply().
What currently I have a doubt is the necessary pa
Hi Sendoh,
from your description it's really hard to figure out what the problem could
be. The first thing to do would be check how many records you actually
consume from Kafka and how many items are outputted. Next I would take a
look at the timestamp extractor. Can it be that records are discard