Re: [slurm-users] Running multiple jobs simultaneously

Matt Jay Thu, 26 Sep 2019 13:37:32 -0700

Hi Matt,

Check out the "OverSubscribe" partition parameter.  Try setting your partition 
to "OverSubscribe=YES" and then submitting the jobs with the "-oversubscibe" 
option (or OverSubscribe=FORCE if you want this to happen for all jobs 
submitted to the partition).   Either oversubscribe option can be followed by a 
colon and the maximum number of jobs that can be assigned to a resource (iirc 
it defaults to 4 - so you might want to increase to allow the number of jobs 
you need - ie, maximum number of jobs you need to run simultaneously divided by 
number of cores available in the partition).


Matt Jay
HPC Systems Engineer - Hyak
Research Computing
University of Washington Information Technology


From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of 
Matt Hohmeister
Sent: Thursday, September 26, 2019 9:14 AM
To: slurm-us...@schedmd.com
Subject: [slurm-users] Running multiple jobs simultaneously

I have a two-node cluster running Slurm, and I'm being asked about allowing 
multiple jobs (hundreds of jobs) to run simultaneously. Following is my 
scheduling part of slurm.conf, which I changed to allow multiple jobs to run on 
each node:

# SCHEDULING
#DefMemPerCPU=0
FastSchedule=1
#MaxMemPerCPU=0
#SchedulerTimeSlice=30
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core

For testing purposes, I'm running this job:

#!/bin/bash
#SBATCH --job-name=whatever
#SBATCH --output=slurmBatchLists_Aug19.out
#SBATCH --error=slurmBatchLists_Aug19.err
#SBATCH --partition=debug
#SBATCH --nodes=1
#SBATCH --array=70-100
#SBATCH --cpus-per-task=5
matlab -nodisplay -nojvm -r 'sampleSlurm($SLURM_ARRAY_TASK_ID);'

...which gives me the following squeue output:

[mhohmeis@odin ~]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES 
NODELIST(REASON)
     1742_[82-100]     debug whatever mhohmeis PD       0:00      1 (Resources)
     1755_[70-100]     debug whatever mhohmeis PD       0:00      1 (Priority)
     1756_[70-100]     debug whatever mhohmeis PD       0:00      1 (Priority)
     1757_[70-100]     debug whatever mhohmeis PD       0:00      1 (Priority)
     1758_[70-100]     debug whatever mhohmeis PD       0:00      1 (Priority)
     1759_[70-100]     debug whatever mhohmeis PD       0:00      1 (Priority)
     1760_[70-100]     debug whatever mhohmeis PD       0:00      1 (Priority)
     1761_[70-100]     debug whatever mhohmeis PD       0:00      1 (Priority)
     1762_[70-100]     debug whatever mhohmeis PD       0:00      1 (Priority)
     1763_[70-100]     debug whatever mhohmeis PD       0:00      1 (Priority)
           1742_70     debug whatever mhohmeis  R       0:03      1 odin
           1742_71     debug whatever mhohmeis  R       0:03      1 odin
           1742_72     debug whatever mhohmeis  R       0:03      1 odin
           1742_73     debug whatever mhohmeis  R       0:03      1 odin
           1742_74     debug whatever mhohmeis  R       0:03      1 odin
           1742_75     debug whatever mhohmeis  R       0:03      1 odin
           1742_76     debug whatever mhohmeis  R       0:03      1 thor
           1742_77     debug whatever mhohmeis  R       0:03      1 thor
           1742_78     debug whatever mhohmeis  R       0:03      1 thor
           1742_79     debug whatever mhohmeis  R       0:03      1 thor
           1742_80     debug whatever mhohmeis  R       0:03      1 thor
           1742_81     debug whatever mhohmeis  R       0:03      1 thor

They're interested in allowing *all* these jobs to run simultaneously. Also, 
when they add #SBATCH --ntasks=30 to the above .sbatch file, this happens when 
they try to run it:

[mhohmeis@odin ~]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES 
NODELIST(REASON)
     2052_[70-100]     debug whatever mhohmeis PD       0:00      4 
(PartitionConfig)

Any thoughts?
Thanks!

Matt Hohmeister
Systems and Network Administrator
Department of Psychology
Florida State University
PO Box 3064301
Tallahassee, FL 32306-4301
Phone: +1 850 645 1902
Fax: +1 850 644 7739
Pronouns: he/him/his

Re: [slurm-users] Running multiple jobs simultaneously

Reply via email to