Hello everybody,

I was asking myself: are there any best practices regarding how to set the
`yarn.application-attempts` configuration key when running Flink on YARN as
a long-running session? The configuration page on the docs states that 1 is
the default and that it is recommended to leave it like that, however in
the case of a long running session it seems to me that the value should be
higher in order to actually allow the session to keep running despite Job
Managers failing.

Furthermore, the HA page on the docs states the following

"""
It’s important to note that yarn.resourcemanager.am.max-attempts is an
upper bound for the application restarts. Therfore, the number of
application attempts set within Flink cannot exceed the YARN cluster
setting with which YARN was started.
"""

However, after some tests conducted by my colleagues and after looking at
the code (FlinkYarnClientBase:522-536) it seems to me that the
flink-conf.yaml key, if set, overrides the yarn-site.xml, which in turn
overrides the fallback value of 1. Is this right? Is the documentation
wrong?

-- 
BR,
Stefano Baghino

Software Engineer @ Radicalbit

Reply via email to