Hi Fanbin,
> 2. I have parallelism = 32 and only one task has the record. Can you
please elaborate more on why this would affect the watermark advancement?
Each parallel subtask of a source function usually generates its watermarks
independently, say wk1, wk2... wkn. The downstream window operator
Hequn,
Thanks for the help. It is indeed a watermark problem. From Flink UI, I can
see the low watermark value for each operator. And the groupBy operator has
lagged value of watermark. I checked the link from SO and confirmed that:
1. I do see record coming in for this operator
2. I have parallel
Hi Fanbin,
Fabian is right, it should be a watermark problem. Probably, some tasks of
the source don't have enough data to advance the watermark. Furthermore,
you could also monitor event time through Flink web interface.
I have answered a similar question on stackoverflow, see more details
here[1
If I use proctime, the groupBy happens without any delay.
On Tue, Jul 23, 2019 at 10:16 AM Fanbin Bu wrote:
> not sure whether this is related:
>
> public SingleOutputStreamOperator assignTimestampsAndWatermarks(
> AssignerWithPeriodicWatermarks timestampAndWatermarkAssigner) {
>
>// m
not sure whether this is related:
public SingleOutputStreamOperator assignTimestampsAndWatermarks(
AssignerWithPeriodicWatermarks timestampAndWatermarkAssigner) {
// match parallelism to input, otherwise dop=1 sources could lead
to some strange
// behaviour: the watermark will creep a
Thanks Fabian for the prompt reply. I just started using Flink and this is
a great community.
The watermark setting is only accounting for 10 sec delay. Besides that,
the local job on IntelliJ is running fine without issues.
Here is the code:
class EventTimestampExtractor(slack: Long = 0L) extend
Hi Fanbin,
The delay is most likely caused by the watermark delay.
A window is computed when the watermark passes the end of the window. If
you configured the watermark to be 10 minutes before the current max
timestamp (probably to account for out of order data), then the window will
be computed w
Hi,
I have a Flink sql streaming job defined by:
SELECT
user_id
, hop_end(created_at, interval '30' second, interval '1' minute) as bucket_ts
, count(name) as count
FROM event
WHERE name = 'signin'
GROUP BY
user_id
, hop(created_at, interval '30' second, interval '1' minute)
there is a