Hello everybody, I was asking myself: are there any best practices regarding how to set the `yarn.application-attempts` configuration key when running Flink on YARN as a long-running session? The configuration page on the docs states that 1 is the default and that it is recommended to leave it like that, however in the case of a long running session it seems to me that the value should be higher in order to actually allow the session to keep running despite Job Managers failing.
Furthermore, the HA page on the docs states the following """ It’s important to note that yarn.resourcemanager.am.max-attempts is an upper bound for the application restarts. Therfore, the number of application attempts set within Flink cannot exceed the YARN cluster setting with which YARN was started. """ However, after some tests conducted by my colleagues and after looking at the code (FlinkYarnClientBase:522-536) it seems to me that the flink-conf.yaml key, if set, overrides the yarn-site.xml, which in turn overrides the fallback value of 1. Is this right? Is the documentation wrong? -- BR, Stefano Baghino Software Engineer @ Radicalbit