+1 in general What is the default in batch, though? No restarts? I always found that somewhat uncommon. Should we also change that part, if we are changing the default anyways?
On Fri, Aug 30, 2019 at 2:35 PM Till Rohrmann <trohrm...@apache.org> wrote: > Hi everyone, > > I wanted to discuss how to simplify Flink's cluster level RestartStrategy > configuration [1]. Currently, Flink's behaviour with respect to configuring > the {{RestartStrategies}} is quite complicated and convoluted. The reason > for this is that we evolved the way it has been configured and wanted to > keep it backwards compatible. Due to this, we have currently the following > behaviour: > > * If the config option `restart-strategy` is configured, then Flink uses > this `RestartStrategy` (so far so simple) > * If the config option `restart-strategy` is not configured, then > ** If `restart-strategy.fixed-delay.attempts` or > `restart-strategy.fixed-delay.delay` are defined, then instantiate > `FixedDelayRestartStrategy(restart-strategy.fixed-delay.attempts, > restart-strategy.fixed-delay.delay)` > ** If `restart-strategy.fixed-delay.attempts` and > `restart-strategy.fixed-delay.delay` are not defined, then > *** If checkpointing is disabled, then choose `NoRestartStrategy` > *** If checkpointing is enabled, then choose > `FixedDelayRestartStrategy(Integer.MAX_VALUE, "0 s")` > > I would like to simplify the configuration by removing the "If > `restart-strategy.fixed-delay.attempts` or > `restart-strategy.fixed-delay.delay`, then" condition. That way, the logic > would be the following: > > * If the config option `restart-strategy` is configured, then Flink uses > this `RestartStrategy` > * If the config option `restart-strategy` is not configured, then > ** If checkpointing is disabled, then choose `NoRestartStrategy` > ** If checkpointing is enabled, then choose > `FixedDelayRestartStrategy(Integer.MAX_VALUE, "0 s")` > > That way we retain the user friendliness that jobs restart if the user > enabled checkpointing and we make it clear that any ` > restart-strategy.fixed-delay.xyz` setting will only be respected if > `restart-strategy` has been set to `fixed-delay`. > > This simplification would, however, change Flink's behaviour and might > break existing setups. Since we introduced `RestartStrategies` with Flink > 1.0.0 and deprecated the prior configuration mechanism which enables > restarting if either the `attempts` or the `delay` has been set, I think > that the number of broken jobs should be minimal if not non-existent. > > I'm sure that one can simplify the way RestartStrategies are > programmatically configured as well but for the sake of simplicity/scoping > I'd like to not touch it right away. > > What do you think about this behaviour change? > > [1] https://issues.apache.org/jira/browse/FLINK-13921 > > Cheers, > Till >