Helen Weng created YARN-11636:
---------------------------------
Summary: App stuck in ACCEPTED state, however Yarn metric thinks
there are no pending apps in the queue
Key: YARN-11636
URL: https://issues.apache.org/jira/browse/YARN-11636
Project: Hadoop YARN
Issue Type: Bug
Affects Versions: 3.2.1
Reporter: Helen Weng
Hi, I've encountered a case recently when an app gets stuck in ACCEPTED state
forever in a queue.
The queue is busy for about 4 hrs so during this time, being stuck in ACCEPTED
is expected. However even as resources become available and all other jobs run,
this job continues to be stuck. I've checked the following states:
1. Resources are available at the leaf queue and cluster level.
2. Other jobs can get the resources to run
3. Not hitting maxAM limits (there is only 1 other job running during this time
in the queue and it is using near 0 resources)
4. When I look at jmx metric it seems to think the app is running. AppsRunning
says 1 and containersRunning says 1 while while AppsPending says 0. However
the app is staunchly in the "Accepted" state and does not seem to be running.
Is this known or have others encountered this issue before? Or do you have any
advice on what I can look into to debug it? Thanks very much for the help.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]