In our production, we usually override the restart delay to be 10 s. We once encountered cases that external services are overwhelmed by reconnections from frequent restarted tasks. As a safer though not optimized option, a default delay larger than 0 s is better in my opinion.
未来阳光 <2217232...@qq.com> 于2019年8月30日周五 下午10:23写道: > Hi, > > > I thinks it's better to increase the default value. +1 > > > Best. > > > > > ------------------ 原始邮件 ------------------ > 发件人: "Till Rohrmann"<trohrm...@apache.org>; > 发送时间: 2019年8月30日(星期五) 晚上10:07 > 收件人: "dev"<d...@flink.apache.org>; "user"<user@flink.apache.org>; > 主题: [SURVEY] Is the default restart delay of 0s causing problems? > > > > Hi everyone, > > I wanted to reach out to you and ask whether decreasing the default delay > to `0 s` for the fixed delay restart strategy [1] is causing trouble. A > user reported that he would like to increase the default value because it > can cause restart storms in case of systematic faults [2]. > > The downside of increasing the default delay would be a slightly increased > restart time if this config option is not explicitly set. > > [1] https://issues.apache.org/jira/browse/FLINK-9158 > [2] https://issues.apache.org/jira/browse/FLINK-11218 > > Cheers, > Till