I should be clear, the JobArrayTaskLimit isn’t the issue (the user’s submitted 
with %1, which is why we’re getting that).  What I don’t understand is why the 
jobs remaining in the queue have no priority at all associated with them.  It’s 
as though the scheduler has forgotten the job array exists altogether.

Tim

--
Tim Cutts
Scientific Computing Platform Lead
AstraZeneca

Find out more about R&D IT Data, Analytics & AI and how we can support you by 
visiting our Service 
Catalogue<https://azcollaboration.sharepoint.com/sites/CMU993> |


From: Ole Holm Nielsen via slurm-users <slurm-users@lists.schedmd.com>
Date: Monday, 7 October 2024 at 10:35 AM
To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>
Subject: [slurm-users] Re: Jobs not getting scheduled, no priority calculation, 
but still in queue?
Hi Tim,

On 10/7/24 11:13, Cutts, Tim via slurm-users wrote:
> Something odd is going on on our cluster.  User has a lot of pending jobs
> in a job array (a few thousand).
>
> squeue -u kmnx005 -r -t PD | head -5
>
>               JOBID PARTITION     NAME     USER ST       TIME  NODES
> NODELIST(REASON)
>
>         3045324_875      core run_scp_  kmnx005 PD       0:00      1
> (JobArrayTaskLimit)
>
>         3045324_876      core run_scp_  kmnx005 PD       0:00      1
> (JobArrayTaskLimit)
>
>         3045324_877      core run_scp_  kmnx005 PD       0:00      1
> (JobArrayTaskLimit)
>
>         3045324_878      core run_scp_  kmnx005 PD       0:00      1
> (JobArrayTaskLimit)
>
> None are getting scheduled.  But when I ask SLURM what that job’s priority
> is, it produces no output:
>
> $ sprio -j 3045324
>
>            JOBID PARTITION   PRIORITY       SITE        AGE  FAIRSHARE
> JOBSIZE  PARTITION        QOS                 TRES
>
> Any clues what’s going on here?
What array limits do you have in slurm.conf?  For example:

$ scontrol show config | grep -i array
MaxArraySize            = 1001

/Ole



--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
________________________________

AstraZeneca UK Limited is a company incorporated in England and Wales with 
registered number:03674842 and its registered office at 1 Francis Crick Avenue, 
Cambridge Biomedical Campus, Cambridge, CB2 0AA.

This e-mail and its attachments are intended for the above named recipient only 
and may contain confidential and privileged information. If they have come to 
you in error, you must not copy or show them to anyone; instead, please reply 
to this e-mail, highlighting the error to the sender and then immediately 
delete the message. For information about how AstraZeneca UK Limited and its 
affiliates may process information, personal data and monitor communications, 
please see our privacy notice at 
www.astrazeneca.com<https://www.astrazeneca.com>
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to