-1 on increasing the default delay to none zero, with below reasons: a) I could see some concerns about setting the delay to zero in the very original JIRA (FLINK-2993 <https://issues.apache.org/jira/browse/FLINK-2993>) but later on in FLINK-9158 <https://issues.apache.org/jira/browse/FLINK-9158> we still decided to make the change, so I'm wondering whether the decision also came from any customer requirement? If so, how could we judge whether one requirement override the other?
b) There could be valid reasons for both default values depending on different use cases, as well as relative work around (like based on latest policy, setting the config manually to 10s could resolve the problem mentioned), and from former replies to this thread we could see users have already taken actions. Changing it back to non-zero again won't affect such users but might cause surprises to those depending on 0 as default. Last but not least, no matter what decision we make this time, I'd suggest to make it final and document in our release note explicitly. Checking the 1.5.0 release note [1] [2] it seems we didn't mention about the change on default restart delay and we'd better learn from it this time. Thanks. [1] https://flink.apache.org/news/2018/05/25/release-1.5.0.html#release-notes [2] https://ci.apache.org/projects/flink/flink-docs-release-1.5/release-notes/flink-1.5.html Best Regards, Yu On Sun, 1 Sep 2019 at 04:33, Steven Wu <stevenz...@gmail.com> wrote: > +1 on what Zhu Zhu said. > > We also override the default to 10 s. > > On Fri, Aug 30, 2019 at 8:58 PM Zhu Zhu <reed...@gmail.com> wrote: > >> In our production, we usually override the restart delay to be 10 s. >> We once encountered cases that external services are overwhelmed by >> reconnections from frequent restarted tasks. >> As a safer though not optimized option, a default delay larger than 0 s >> is better in my opinion. >> >> >> 未来阳光 <2217232...@qq.com> 于2019年8月30日周五 下午10:23写道: >> >>> Hi, >>> >>> >>> I thinks it's better to increase the default value. +1 >>> >>> >>> Best. >>> >>> >>> >>> >>> ------------------ 原始邮件 ------------------ >>> 发件人: "Till Rohrmann"<trohrm...@apache.org>; >>> 发送时间: 2019年8月30日(星期五) 晚上10:07 >>> 收件人: "dev"<d...@flink.apache.org>; "user"<user@flink.apache.org>; >>> 主题: [SURVEY] Is the default restart delay of 0s causing problems? >>> >>> >>> >>> Hi everyone, >>> >>> I wanted to reach out to you and ask whether decreasing the default delay >>> to `0 s` for the fixed delay restart strategy [1] is causing trouble. A >>> user reported that he would like to increase the default value because it >>> can cause restart storms in case of systematic faults [2]. >>> >>> The downside of increasing the default delay would be a slightly >>> increased >>> restart time if this config option is not explicitly set. >>> >>> [1] https://issues.apache.org/jira/browse/FLINK-9158 >>> [2] https://issues.apache.org/jira/browse/FLINK-11218 >>> >>> Cheers, >>> Till >> >>