Everybody,
Thank you for the quick response.
Yes, we inadvertently used the -d/--drain flag when stopping the job. We
were not aware that it would cause a MAX_WATERMARK to roll through our
system.
MAX_WATERMARKS are catastrophic for the event time timers we have in our
system.
We know now never
the infinite iteration over timers …
I believe the behavior exhibited by flink is intentional and no defect!
What do you think?
Thias
From: JING ZHANG
Sent: Freitag, 24. September 2021 12:25
To: Guowei Ma
Cc: Marco Villalobos ; user
Subject: Re: stream processing savepoints and watermarks quest
Hi Guowei,
Thanks for quick response, maybe I didn't express it clearly in the last
email.
In fact, above case happened in reality, not what I imagined.
When MAX_WATERMARK is received, the operator will try to fire all
registered event-time timers. However in the above case, new timers are
continuo
Hi, JING
Thanks for the case.
But I am not sure this would happen. As far as I know the event timer could
only be triggered when there is a watermark (except the "quiesce phase").
I think it could not advance any watermarks after MAX_WATERMARK is received.
Best,
Guowei
On Fri, Sep 24, 2021 at 4
Hi Guowei,
I could provide a case that I have encountered which timers to fire
indefinitely when doing drain savepoint.
After an event timer is triggered, it registers another event timer
whose value equals the value of triggered timer plus an interval time.
If a MAX_WATERMARK comes, the timer is t
Hi Macro
Indeed, as mentioned by JING, if you want to drain when triggering
savepoint, you will encounter this MAX_WATERMARK.
But I have a problem. In theory, even with MAX_WATERMARK, there will not be
an infinite number of timers. And these timers should be generated by the
application code.
You
Hi Macro,
Do you specified drain flag when stop a job with a savepoint?
If the --drain flag is specified, then a MAX_WATERMARK will be emitted
before the last checkpoint barrier.
[1]
https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/cli/#stopping-a-job-gracefully-creating-a-fi
Something strange happened today.
When we tried to shutdown a job with a savepoint, the watermarks became
equal to 2^63 - 1.
This caused timers to fire indefinitely and crash downstream systems with
overloaded untrue data.
We are using event time processing with Kafka as our source.
It seems imp