On 2/14/19 8:02 AM, Mahmood Naderan wrote:

One job is in RH state which means JobHoldMaxRequeue.
The output file, specified by --output shows nothing suspicious.
Is there any way to analyze the stuck job?

This happens when a job fails to start for MAX_BATCH_REQUEUE times (which is 5 at the moment).

Check your controller and slurmd logs to see what goes wrong when Slurm tries to start it.

All the best,
Chris

Reply via email to