Matt, Depending on other parameters for the job, your '--ntasks=30' is likely having the effect of requesting 30 (or more) cores for that individual job, which likely is not "fitting" on an individual node (oversubscribe allows multiple jobs to share a resource, but doesn't impact resource request/requirements for an individual job).
The best approach will depend on the particulars of the job itself, but setting "--ntasks-per-core" in conjunction with the "--ntasks=30" would be one way to allow a job with more tasks than the core count on any of your nodes to run. Matt Jay HPC Systems Engineer - Hyak Research Computing University of Washington Information Technology From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of Matt Hohmeister Sent: Thursday, September 26, 2019 1:56 PM To: Slurm User Community List <slurm-users@lists.schedmd.com> Subject: Re: [slurm-users] Running multiple jobs simultaneously I just did that...beautiful...thanks! The "default" let me run 48 jobs concurrently across two nodes. I've noticed that, still, when I have "#SBATCH --ntasks=30" in my .sbatch file, the job still refuses to run, and I'm back at the below. Should I just ask my users to not use -ntasks in their .sbatch files? [mhohmeis at odin<https://lists.schedmd.com/cgi-bin/mailman/listinfo/slurm-users> ~]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 2052_[70-100] debug whatever mhohmeis PD 0:00 4 (PartitionConfig) Matt Hohmeister Systems and Network Administrator Department of Psychology Florida State University PO Box 3064301 Tallahassee, FL 32306-4301 Phone: +1 850 645 1902 Fax: +1 850 644 7739 Pronouns: he/him/his From: slurm-users <slurm-users-boun...@lists.schedmd.com<mailto:slurm-users-boun...@lists.schedmd.com>> On Behalf Of Matt Jay Sent: Thursday, September 26, 2019 4:34 PM To: Slurm User Community List <slurm-users@lists.schedmd.com<mailto:slurm-users@lists.schedmd.com>> Subject: Re: [slurm-users] Running multiple jobs simultaneously Hi Matt, Check out the "OverSubscribe" partition parameter. Try setting your partition to "OverSubscribe=YES" and then submitting the jobs with the "-oversubscibe" option (or OverSubscribe=FORCE if you want this to happen for all jobs submitted to the partition). Either oversubscribe option can be followed by a colon and the maximum number of jobs that can be assigned to a resource (iirc it defaults to 4 - so you might want to increase to allow the number of jobs you need - ie, maximum number of jobs you need to run simultaneously divided by number of cores available in the partition). Matt Jay HPC Systems Engineer - Hyak Research Computing University of Washington Information Technology