Till Rohrmann created FLINK-13921:
-------------------------------------

             Summary: Simplify cluster level RestartStrategy configuration
                 Key: FLINK-13921
                 URL: https://issues.apache.org/jira/browse/FLINK-13921
             Project: Flink
          Issue Type: Improvement
          Components: Runtime / Coordination
    Affects Versions: 1.10.0
            Reporter: Till Rohrmann
             Fix For: 1.10.0


Currently, Flink's behaviour with respect to configuring the 
{{RestartStrategies}} is quite complicated and convoluted. The reason for this 
is that we evolved the way it has been configured and wanted to keep it 
backwards compatible. Due to this, we have currently the following behaviour:

* If the config option {{restart-strategy}} is configured, then Flink uses this 
{{RestartStrategy}} (so far so simple :-)
* If the config option {{restart-strategy}} is not configured, then 
** If {{restart-strategy.fixed-delay.attempts}} or 
{{restart-strategy.fixed-delay.delay}} are defined, then instantiate 
{{FixedDelayRestartStrategy(restart-strategy.fixed-delay.attempts, 
restart-strategy.fixed-delay.delay)}}
** If {{restart-strategy.fixed-delay.attempts}} and 
{{restart-strategy.fixed-delay.delay}} are not defined, then
*** If checkpointing is disabled, then choose {{NoRestartStrategy}}
*** If checkpointing is enabled, then choose 
{{FixedDelayRestartStrategy(Integer.MAX_VALUE, "0 s")}}

I would like to simplify the configuration by removing the "If 
{{restart-strategy.fixed-delay.attempts}} or 
{{restart-strategy.fixed-delay.delay}}, then" condition. That way, the logic 
would be the following:

* If the config option {{restart-strategy}} is configured, then Flink uses this 
{{RestartStrategy}} (so far so simple :-)
* If the config option {{restart-strategy}} is not configured, then 
** If checkpointing is disabled, then choose {{NoRestartStrategy}}
** If checkpointing is enabled, then choose 
{{FixedDelayRestartStrategy(Integer.MAX_VALUE, "0 s")}}

That way we retain the user friendliness that their jobs restart if they enable 
checkpointing and we make it clear that any {{restart-strategy.fixed-delay}} 
setting will only be respected if {{restart-strategy}} has been set to 
{{fixed-delay}}.

This simplification would, however, change Flink's behaviour and might break 
existing setups. Since we introduced {{RestartStrategies}} with Flink {{1.0.0}} 
and deprecated the prior configuration mechanism which enables restarting if 
either the {{attempts}} or the {{delay}} has been set, I think that the number 
of broken jobs should be minimal if not non-existent.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to