I'm trying to understand Flink YARN configuration. The flink-conf.yaml file is 
supposedly the way to configure Flink, except when you launch Flink using YARN 
since that's determined for the AM. The following is contradictory or not 
completely clear:


"The system will use the configuration in conf/flink-config.yaml. Please follow 
our configuration 
guide<https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html> 
if you want to change something.

Flink on YARN will overwrite the following configuration parameters 
jobmanager.rpc.address (because the JobManager is always allocated at different 
machines), taskmanager.tmp.dirs (we are using the tmp directories given by 
YARN) and parallelism.default if the number of slots has been specified."

OK, so it will use conf/flink-config.yaml, except for 
jobmanager.rpc.address/port which will be decided by YARN and not necessarily 
reported to the user since those are dynamically allocated by YARN. That's fine 
with me, but if I want to make a "long-running" Flink cluster available for 
more than one user, where do I check in Flink for the Application Master 
hostname--or do I just have to scrape output of logs (which would definitely be 
undesirable)? First, I thought this would be written by Flink to 
conf/flink-config.yaml. It is not. Then I thought it must surely be written to 
the HDFS configuration directory (under something like hdfs://$USER/.flink/) 
for that application but that is merely copied from the original 
conf/flink-config.yaml and doesn't have an accurate configuration for the 
specified application. So is there an accurate config somewhere in HDFS or on 
the ResourceManager--i.e. where could I programmatically find that (outside of 
manipulating YARN app names or scraping)?

Thanks,
Craig



Reply via email to