Re: [slurm-users] Strange error, submission denied

Prentice Bisbal Wed, 20 Feb 2019 07:14:17 -0800

On 2/20/19 12:08 AM, Marcus Wagner wrote:

Hi Prentice,
On 2/19/19 2:58 PM, Prentice Bisbal wrote:
--ntasks-per-node is meant to be used in conjunction with --nodesoption. From https://slurm.schedmd.com/sbatch.html:
*--ntasks-per-node*=</ntasks/>
    Request that /ntasks/ be invoked on each node. If used with the
    *--ntasks* option, the *--ntasks* option will take precedence
    and the *--ntasks-per-node* will be treated as a /maximum/ count
    of tasks per node. Meant to be used with the *--nodes* option...
Yes, but used together with --ntasks would mean to use e.g. 48 tasksat maximum per node. I don't see, where there lies the differenceregarding submission of the job. Even if the semantic (how or howmany cores will be scheduled onto which number of hosts) might beincorrect, at least the syntax should be correct.

The difference would be in how Slurm looks at those specificationsinternally. To us humans, what you say should work seems logical, but ifSlurm wasn't programmed to behave that way, it won't. I provided thequote from the documentation, since that implies, to me at least, thatSlurm isn't programmed to behave like that. Looking at the source codeor asking SchedMD could confirm that.

If you don't specify --ntasks, it defaults to --ntasks=1, as Andreassaid. https://slurm.schedmd.com/sbatch.html:
*-n*, *--ntasks*=</number/>
    sbatch does not launch tasks, it requests an allocation of
    resources and submits a batch script. This option advises the
    Slurm controller that job steps run within the allocation will
    launch a maximum of /number/ tasks and to provide for sufficient
    resources. The default is one task per node, but note that the
*--cpus-per-task* option will change this default.
So the correct way to specify your job is either like this

--ntasks=48

or

--nodes=1 --ntasks-per-node=48
Specifying both --ntasks-per-node and --ntasks at the same time isnot correct.
funnily the result is the same:

$> sbatch -N 1 --ntasks-per-node=48 --wrap hostname
sbatch: error: CPU count per node can not be satisfied
sbatch: error: Batch job submission failed: Requested nodeconfiguration is not available
whereas just using --ntasks=48 gets submitted and it gets scheduledonto one host:
$> sbatch --ntasks=48 --wrap hostname
sbatch: [I] No output file given, set to: output_%j.txt
sbatch: [I] No runtime limit given, set to: 15 minutes
Submitted batch job 199784
$> scontrol show job 199784 | egrep "NumNodes|TRES"
   NumNodes=1 NumCPUs=48 NumTasks=48 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=48,mem=182400M,node=1,billing=48
To me, this still looks like a bug, not like the wrong usage ofsubmission parameters.

Either a bug, or there's something subtly wrong with your slurm.conf. Iwould continue troubleshooting by simplifying both your node definitionand SelectType options as much as possible, and see if the problem stillpersists. Also, look at 'scontrol show node <node name>' to see if yourdefinition in slurm.conf lines up with how Slurm actually sees the node.I don't think I saw that output anywhere is this thread yet.


Does no one else use nodes in this shared way?
If nodes are shared, do you schedule by hardware threads or by cores?
If you schedule by cores, how did you implement this in slurm?


Best
Marcus



Prentice
On 2/14/19 1:09 AM, Henkel, Andreas wrote:

Hi Marcus,

What just came to my mind: if you don’t set —ntasks isn’t the default just 1? All 
examples I know using ntasks-per-node also set ntasks with ntasks >= 
ntasks-per-node.

Best,
Andreas

Am 14.02.2019 um 06:33 schrieb Marcus Wagner<wag...@itc.rwth-aachen.de>:

Hi all,

I have narrowed this down a little bit.

the really astonishing thing is, that if I use

--ntasks=48

I can submit the job, it will be scheduled onto one host:

    NumNodes=1 NumCPUs=48 NumTasks=48 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
    TRES=cpu=48,mem=182400M,node=1,billing=48

but as soon as I change --ntasks to --ntasks-per-node (which should be the 
same, as --ntasks=48 schedules onto one host), I get the error:

sbatch: error: CPU count per node can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not 
available


Is there no one else, who observes this behaviour?
Any explanations?


Best
Marcus

On 2/13/19 1:48 PM, Marcus Wagner wrote:
Hi all,

I have a strange behaviour here.
We are using slurm 18.08.5-2 on CentOS 7.6.

Let me first describe our computenodes:
NodeName=ncm[0001-1032]  CPUs=48  Sockets=4 CoresPerSocket=12 ThreadsPerCore=2 
RealMemory=185000 Feature=skx8160,hostok,hpcwork                        
Weight=10541 State=UNKNOWN

we have the following config set:

$>scontrol show config | grep -i select
SelectType              = select/cons_res
SelectTypeParameters    = CR_CORE_MEMORY,CR_ONE_TASK_PER_CORE


So, I have 48 cores on one node. According to the manpage of sbatch, I should 
be able to do the following:

#SBATCH --ntasks=48
#SBATCH --ntasks-per-node=48

But I get the following error:
sbatch: error: Batch job submission failed: Requested node configuration is not 
available


Has anyone an explanation for this?


Best
Marcus

--
Marcus Wagner, Dipl.-Inf.

IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de


--
Marcus Wagner, Dipl.-Inf.

IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de

Re: [slurm-users] Strange error, submission denied

Reply via email to