Matt,

Depending on other parameters for the job, your '--ntasks=30' is likely having 
the effect of requesting 30 (or more) cores for that individual job, which 
likely is not "fitting" on an individual node (oversubscribe allows multiple 
jobs to share a resource, but doesn't impact resource request/requirements for 
an individual job).

The best approach will depend on the particulars of the job itself, but setting 
"--ntasks-per-core" in conjunction with the "--ntasks=30" would be one way to 
allow a job with more tasks than the core count on any of your nodes to run.

Matt Jay
HPC Systems Engineer - Hyak
Research Computing
University of Washington Information Technology


From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of 
Matt Hohmeister
Sent: Thursday, September 26, 2019 1:56 PM
To: Slurm User Community List <slurm-users@lists.schedmd.com>
Subject: Re: [slurm-users] Running multiple jobs simultaneously

I just did that...beautiful...thanks! The "default" let me run 48 jobs 
concurrently across two nodes.

I've noticed that, still, when I have "#SBATCH --ntasks=30" in my .sbatch file, 
the job still refuses to run, and I'm back at the below. Should I just ask my 
users to not use -ntasks in their .sbatch files?


[mhohmeis at 
odin<https://lists.schedmd.com/cgi-bin/mailman/listinfo/slurm-users> ~]$ squeue

             JOBID PARTITION     NAME     USER ST       TIME  NODES 
NODELIST(REASON)

     2052_[70-100]     debug whatever mhohmeis PD       0:00      4 
(PartitionConfig)

Matt Hohmeister
Systems and Network Administrator
Department of Psychology
Florida State University
PO Box 3064301
Tallahassee, FL 32306-4301
Phone: +1 850 645 1902
Fax: +1 850 644 7739
Pronouns: he/him/his

From: slurm-users 
<slurm-users-boun...@lists.schedmd.com<mailto:slurm-users-boun...@lists.schedmd.com>>
 On Behalf Of Matt Jay
Sent: Thursday, September 26, 2019 4:34 PM
To: Slurm User Community List 
<slurm-users@lists.schedmd.com<mailto:slurm-users@lists.schedmd.com>>
Subject: Re: [slurm-users] Running multiple jobs simultaneously

Hi Matt,

Check out the "OverSubscribe" partition parameter.  Try setting your partition 
to "OverSubscribe=YES" and then submitting the jobs with the "-oversubscibe" 
option (or OverSubscribe=FORCE if you want this to happen for all jobs 
submitted to the partition).   Either oversubscribe option can be followed by a 
colon and the maximum number of jobs that can be assigned to a resource (iirc 
it defaults to 4 - so you might want to increase to allow the number of jobs 
you need - ie, maximum number of jobs you need to run simultaneously divided by 
number of cores available in the partition).

Matt Jay
HPC Systems Engineer - Hyak
Research Computing
University of Washington Information Technology

Reply via email to