Hello Kelly, I thought that Flink scheduler only starts a job if all requested containers/TMs are available and allotted to that job.
How can I reproduce your issue on Flink with YARN? Thank you, Piper On Thu, Nov 21, 2019, 1:48 PM Kelly Smith <kell...@zillowgroup.com> wrote: > I’ve been running Flink in production on EMR (YARN) for some time and have > found the metrics system to be quite useful, but there is one specific case > where I’m missing a signal for this scenario: > > > > - When a job has been submitted, but YARN does not have enough > resources to provide > > > > Observed: > > - Job is in RUNNING state > - All of the tasks for the job are in the (I believe) DEPLOYING state > > > > Is there a way to access these as metrics for monitoring the number of > tasks in each state for a given job (image below)? The metric I’m currently > using is the number of running jobs, but it misses this “unhealthy” > scenario. I realize that I could use application-level metrics (record > counts, etc) as a proxy for this, but I’m working on providing a streaming > platform and need all of my monitoring to be application agnostic. > > > > I can’t find anything on it in the documentation. > > > > Thanks, > > Kelly >