Hi everyone,

I'd like to discuss changing the default restart delay for FixedDelay- and
FailureRateRestartStrategy to "1 s" [1].

According to a user survey about the default value of the restart delay
[2], it turned out that the current default value of "0 s" is not optimal.
In practice Flink users tend to set it to a non-zero value (e.g. "10 s") in
order to prevent restart storms originating from overloaded external
systems.

I would like to set the default restart delay of the
FixedDelayRestartStrategy ("restart-strategy.fixed-delay.delay") and of the
FailureRateRestartStrategy ("restart-strategy.failure-rate.delay") to "1
s". "1 s" should prevent restart storms originating from causes outside of
Flink (e.g. overloaded external systems) and still be fast enough to not
having a noticeable effect on most Flink deployments.

However, this change will affect all users who currently rely on the
current default restart delay value ("0 s"). The plan is to add a release
note to make these users aware of this change when upgrading Flink.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-62%3A+Set+default+restart+delay+for+FixedDelay-+and+FailureRateRestartStrategy+to+1s
[2]
https://lists.apache.org/thread.html/107b15de6b8ac849610d99c4754715d2a8a2f32ddfe9f8da02af2ccc@%3Cdev.flink.apache.org%3E

Cheers,
Till

Reply via email to