I guess that most things have already been said on the related discussion thread [1]. Hence I will start a vote about this FLIP.
[1] https://lists.apache.org/thread.html/80bef7146f9696f35b1e50ff4acdd1cc3e87ae6f212d205aa7a72182@%3Cdev.flink.apache.org%3E Cheers, Till On Mon, Sep 2, 2019 at 10:56 PM Till Rohrmann <trohrm...@apache.org> wrote: > Hi everyone, > > I'd like to discuss FLIP-61 [1] which tries to simplify Flink's cluster > lever RestartStrategy configuration. > > Currently, Flink's behaviour with respect to configuring the > `RestartStrategies` is quite complicated and convoluted. The reason for > this is that we evolved the way it has been configured and wanted to keep > it backwards compatible. Due to this, we have currently the following > behaviour: > > * If the config option `restart-strategy` is configured, then Flink uses > this `RestartStrategy` (so far so simple) > * If the config option `restart-strategy` is not configured, then > ** If `restart-strategy.fixed-delay.attempts` or > `restart-strategy.fixed-delay.delay` are defined, then instantiate > `FixedDelayRestartStrategy(restart-strategy.fixed-delay.attempts, > restart-strategy.fixed-delay.delay)` > ** If `restart-strategy.fixed-delay.attempts` and > `restart-strategy.fixed-delay.delay` are not defined, then > *** If checkpointing is disabled, then choose `NoRestartStrategy` > *** If checkpointing is enabled, then choose > `FixedDelayRestartStrategy(Integer.MAX_VALUE, "0 s")` > > I would like to simplify the configuration by removing the "If > `restart-strategy.fixed-delay.attempts` or > `restart-strategy.fixed-delay.delay`, then" condition. That way, the logic > would be the following: > > * If the config option `restart-strategy` is configured, then Flink uses > this `RestartStrategy` > * If the config option `restart-strategy` is not configured, then > ** If checkpointing is disabled, then choose `NoRestartStrategy` > ** If checkpointing is enabled, then choose > `FixedDelayRestartStrategy(Integer.MAX_VALUE, "0 s")` > > That way we retain the user friendliness that jobs restart if the user > enabled checkpointing and we make it clear that any ` > restart-strategy.fixed-delay.xyz` setting will only be respected if > `restart-strategy` has been set to `fixed-delay`. > > This simplification would, however, change Flink's behaviour and might > break existing setups. Since we introduced `RestartStrategies` with Flink > 1.0.0 and deprecated the prior configuration mechanism which enables > restarting if either the `attempts` or the `delay` has been set, I think > that the number of broken jobs should be minimal if not non-existent. > > There has been a previous discuss thread which is now being abandoned [2]. > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-61+Simplify+Flink%27s+cluster+level+RestartStrategy+configuration > [2] > https://lists.apache.org/thread.html/80bef7146f9696f35b1e50ff4acdd1cc3e87ae6f212d205aa7a72182@%3Cdev.flink.apache.org%3E > > Cheers, > Till >