I believe the default value of this would prevent jobs from sharing a node. You may want to look at this and change it from the default.
-- Brian D. Haymore University of Utah Center for High Performance Computing 155 South 1452 East RM 405 Salt Lake City, Ut 84112 Phone: 801-558-1150, Fax: 801-585-5366 http://bit.ly/1HO1N2C On Sep 10, 2018 6:30 AM, Felix Wolfheimer <f.wolfhei...@googlemail.com> wrote: No this happens without the "Oversubscribe" parameter being set. I'm using custom resources though: GresTypes=some_resource NodeName=compute-[1-100] CPUs=10 Gres=some_resource:10 State=CLOUD Submission uses: sbatch --nodes=1 --ntasks-per-node=1 --gres=some_resource:1 But I just tried it without requesting this custom resource. It shows the same behavior, i.e., SLURM spins N nodes when I submit N jobs to the queue regardless what the resource request of each job is. Am Mo., 10. Sep. 2018 um 03:55 Uhr schrieb Brian Haymore <brian.haym...@utah.edu<mailto:brian.haym...@utah.edu>>: What do you have the OverSubscribe parameter set on the partition your using? -- Brian D. Haymore University of Utah Center for High Performance Computing 155 South 1452 East RM 405 Salt Lake City, Ut 84112 Phone: 801-558-1150, Fax: 801-585-5366 http://bit.ly/1HO1N2C ________________________________________ From: slurm-users [slurm-users-boun...@lists.schedmd.com<mailto:slurm-users-boun...@lists.schedmd.com>] on behalf of Felix Wolfheimer [f.wolfhei...@googlemail.com<mailto:f.wolfhei...@googlemail.com>] Sent: Sunday, September 09, 2018 1:35 PM To: slurm-users@lists.schedmd.com<mailto:slurm-users@lists.schedmd.com> Subject: [slurm-users] Elastic Compute I'm using the SLURM Elastic Compute feature and it works great in general. However, I noticed that there's a bit of inefficiency in the decision about the number of nodes which SLURM creates. Let's say I've the following configuration NodeName=compute-[1-100] CPUs=10 State=CLOUD and there are none of these nodes up and running. Let's further say that I create 10 identical jobs and submit them at the same time using sbatch --nodes=1 --ntasks-per-node=1 I expected that SLURM finds out that 10 CPUs are required in total to serve the requirements for all jobs and, thus, creates a single compute node. However, SLURM triggers the creation of one node per job, i.e., 10 nodes are created. When the first of these ten nodes is ready to accept jobs, SLURM assigns all of the 10 submitted jobs to this single node, though. The other nine nodes which were created are running idle and are terminated again after a while. I'm using "SelectType=select/cons_res" to schedule on the CPU level. Is there some knob which influences this behavior or is this behavior hard-coded?