Hi Tim,

On 10/7/24 11:13, Cutts, Tim via slurm-users wrote:
Something odd is going on on our cluster.  User has a lot of pending jobs in a job array (a few thousand).

squeue -u kmnx005 -r -t PD | head -5

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)

       3045324_875      core run_scp_  kmnx005 PD       0:00      1 (JobArrayTaskLimit)

       3045324_876      core run_scp_  kmnx005 PD       0:00      1 (JobArrayTaskLimit)

       3045324_877      core run_scp_  kmnx005 PD       0:00      1 (JobArrayTaskLimit)

       3045324_878      core run_scp_  kmnx005 PD       0:00      1 (JobArrayTaskLimit)

None are getting scheduled.  But when I ask SLURM what that job’s priority is, it produces no output:

$ sprio -j 3045324

          JOBID PARTITION   PRIORITY       SITE        AGE  FAIRSHARE JOBSIZE  PARTITION        QOS                 TRES

Any clues what’s going on here?
What array limits do you have in slurm.conf?  For example:

$ scontrol show config | grep -i array
MaxArraySize            = 1001

/Ole



--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to