[slurm-dev] Re: Jobstep distribution among nodes

Joan Arbona Mon, 07 Apr 2014 08:45:57 -0700

     Here it goes:

##################BEGIN SLURM.CONF#######################
ClusterName=foner
ControlMachine=foner1,foner2
ControlAddr=slurm-server
#BackupController=
#BackupAddr=
#
SlurmUser=slurm
#SlurmdUser=root
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
CryptoType=crypto/munge
JobCredentialPrivateKey=/etc/slurm/private.key
JobCredentialPublicCertificate=/etc/slurm/public.key
StateSaveLocation=/SLURM
SlurmdSpoolDir=/var/log/slurm/spool_slurmd//
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/var/run/slurm/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
#ProctrackType=proctrack/pgid
ProctrackType=proctrack/linuxproc
TaskPlugin=task/affinity
TaskPluginParam=Cpusets
#PluginDir=
CacheGroups=0
#FirstJobId=
ReturnToService=0
#MaxJobCount=
#PlugStackConfig=
#PropagatePrioProcess=
#PropagateResourceLimits=
#PropagateResourceLimitsExcept=
#Prolog=/data/scripts/prolog_ctld.sh
#Prolog=
Epilog=/data/scripts/epilog.sh
#SrunProlog=
#SrunEpilog=
#TaskProlog=
#TaskEpilog=
#TaskPlugin=
#TrackWCKey=no
#TreeWidth=50
#TmpFS=
#UsePAM=
#UsePAM=1
#
# TIMERS
SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
#
# SCHEDULING
SchedulerType=sched/backfill
#SchedulerAuth=
#SchedulerPort=
#SchedulerRootFilter=
#SelectType=select/linear
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory,CR_CORE_DEFAULT_DIST_BLOCK
FastSchedule=1
PriorityType=priority/multifactor
#PriorityDecayHalfLife=14-0
#PriorityUsageResetPeriod=14-0
PriorityWeightFairshare=0
PriorityWeightAge=0
PriorityWeightPartition=0
PriorityWeightJobSize=0
PriorityWeightQOS=1000
#PriorityMaxAge=1-0
#
# LOGGING
SlurmctldDebug=5
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdDebug=5
SlurmdLogFile=/var/log/slurm/slurmd.log
JobCompType=jobcomp/none
#JobCompLoc=
#
# ACCOUNTING
#JobAcctGatherType=jobacct_gather/linux
#JobAcctGatherFrequency=30
#
#AccountingStorageType=accounting_storage/slurmdbd
##AccountingStorageHost=slurm-server
#AccountingStorageLoc=
#AccountingStoragePass=
#AccountingStorageUser=
#


AccountingStorageEnforce=qos
AccountingStorageLoc=slurm_acct_db
AccountingStorageType=accounting_storage/slurmdbd
AccountingStoragePort=8544
AccountingStorageUser=root
#AccountingStoragePass=slurm
AccountingStorageHost=slurm-server
# ACCT_GATHER
JobAcctGatherType=jobacct_gather/linux
JobAcctGatherFrequency=60
#AcctGatherEnergyType=acct_gather_energy/rapl
#AcctGatherNodeFreq=30

#Memoria
#DefMemPerCPU=1024 # 1GB
#MaxMemPerCPU=3072 # 3GB
# COMPUTE NODES
NodeName=foner[11-14] Procs=20 RealMemory= 258126 Sockets=2
CoresPerSocket=10 ThreadsPerCore=1 State=UNKNOWN

NodeName=foner[101-142] CPUs=20 Sockets=2 CoresPerSocket=10
ThreadsPerCore=1 RealMemory=64398 State=UNKNOWN

PartitionName=thin Nodes=foner[103-142] Shared=NO PreemptMode=CANCEL
State=UP MaxTime=4320 MinNodes=2
PartitionName=thin_test Nodes=foner[101,102] Default=YES Shared=NO
PreemptMode=CANCEL State=UP MaxTime=60 MaxNodes=1
PartitionName=fat Nodes=foner[11-14] Shared=NO PreemptMode=CANCEL
State=UP MaxTime=4320 MaxNodes=1

##################END SLURM.CONF#######################
   On 07/04/14 17:40, Mehdi Denou wrote:
Could you provide us the slurm.conf ?

On 04/04/2014 14:46, Joan Arbona wrote:
     Doesn't work either. I also tried with -m block:block with no luck...

On 04/04/14 14:13, Mehdi Denou wrote:
       Of course, -N 1 is wrong since you request more cpu than available on 1
node.
I didn't read your mail to the end sorry.

try with: -n 25 -m plane=20

On 04/04/2014 13:57, Joan Arbona wrote:
         Not working, just says that cannot use more nodes than requested:

srun: error: Unable to create job step: More processors requested than
permitted

Thanks

On 04/04/14 13:50, Mehdi Denou wrote:
           Try with:
srun -N 1 -n 25

On 04/04/2014 13:47, Joan Arbona wrote:
             Excuse me, I confused "Nodes" with "Tasks". When I wrote "Nodes" 
in the
last e-mail I meant "tasks".

Let me explain it again with an example:

My cluster has 2 nodes with 20 processors/node. I want to allocate all
40 processors and both nodes in sbatch. Then I have to execute a jobstep
with srun on a subset of 25 processors. I want SLURM to fill completely
the maximum number of nodes: That is, using all 20 processors of the
first node and 5 of the second one.

If I execute an sbatch like this:
#!/bin/bash
[...]
#SBATCH --nodes=2
#SBATCH --ntasks=40
srun -n25 hostname

Does not work and executes 12 hostname on the first node and 13 on the
second one, and should execute 20 hostname on the first one and 5 on the
second one.
Thanks and sorry for the confusion,
Joan
On 04/04/14 13:22, Mehdi Denou wrote:
               It's a little bit confusing:

When in sbatch I specify that I want to allocate 25 nodes and I execute

So it means -N 25
For example if you want to allocate 40 nodes and then execute srun on 25:

#!/bin/bash
#SBATCH -N 40

srun -N 25 hostname

-n is the number of task (the number of system process)
-N or --nodes is the number of nodes.

If you don't specify -n it's set to 1 by default.

On 04/04/2014 11:24, Joan Arbona wrote:
                 Thanks for the answer. No luck anyway.
When in sbatch I specify that I want to allocate 25 nodes and I execute
srun without parameters it works. However, if I specify I want to
allocate 40 nodes and then I execute srun selecting only 25 of them it
does not work.

That is:

---

1.
#!/bin/bash
[...]
#SBATCH --nodes=2
#SBATCH --ntasks=25

srun hostname

-> Works, but we don't want it because we need srun to select a subset
of the requested nodes.

---

2.
#!/bin/bash
[...]
#SBATCH --nodes=2
#SBATCH --ntasks=40

srun -n25 hostname

-> Doesn't work. Executes half of the processes on the first node and
the other half on the second. Also tried to remove --nodes=2.

---

It seems that it's the way sbatch influences srun. Is there anyway to
see which parameters does the sbatch call transfers to srun?

Thanks,
Joan
On 04/04/14 10:54, Mehdi Denou wrote:
                   Hello,

You should take a look at the parameter --mincpu
On 04/04/2014 10:22, Joan Arbona wrote:
                     Hello all,

We have a cluster with 40 nodes and 20 cores for node and we are trying
to distribute jobsteps executed with sbatch "in blocks". That means we
want to fill the maximum number of nodes and, if the number of tasks is
not multiple of 20, to have only one node without all cores busy. For
example, if we executed a task on 25 cores, we would have node 1 with
all 20 cores reserved and node 2 with only 5 cores reserved.

If we execute
srun  -n25  -pthin hostname
works fine and produces the following output:

foner118
foner118
foner118
foner118
foner118
foner118
foner118
foner118
foner118
foner118
foner118
foner118
foner118
foner118
foner118
foner118
foner118
foner118
foner118
foner118
foner119
foner119
foner119
foner119
foner119
However, when we execute this in a sbatch script it does not work at
all. I have tried it with all possible configurations I know and with
all useful parameters. Instead it executes 13 processes on the first
node and 12 processes on the second node.

This is our sbatch script:
#!/bin/bash
#SBATCH --job-name=prova_joan
#SBATCH --partition=thin
#SBATCH --output=WRFJobName-%j.out
#SBATCH --error=WRFJobName-%j.err
#SBATCH --nodes=2
#SBATCH --ntasks=40

srun  -n25 --exclusive  hostname &

wait

I have already tried to remove the --exclusive and the & without success.

To sum up, the question is: What's the way to group tasks of jobsteps so
they fill as many nodes as possible with sbatch?

Thanks,
Joan
PS: Attaching slurm.conf:
##################BEGIN SLURM.CONF#######################
ClusterName=foner
ControlMachine=foner1,foner2
ControlAddr=slurm-server
#BackupController=
#BackupAddr=
#
SlurmUser=slurm
#SlurmdUser=root
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
CryptoType=crypto/munge
JobCredentialPrivateKey=/etc/slurm/private.key
JobCredentialPublicCertificate=/etc/slurm/public.key
StateSaveLocation=/SLURM
SlurmdSpoolDir=/var/log/slurm/spool_slurmd/
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/var/run/slurm/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
#ProctrackType=proctrack/pgid
ProctrackType=proctrack/linuxproc
TaskPlugin=task/affinity
TaskPluginParam=Cpusets
#PluginDir=
CacheGroups=0
#FirstJobId=
ReturnToService=0
#MaxJobCount=
#PlugStackConfig=
#PropagatePrioProcess=
#PropagateResourceLimits=
#PropagateResourceLimitsExcept=
#Prolog=/data/scripts/prolog_ctld.sh
#Prolog=
Epilog=/data/scripts/epilog.sh
#SrunProlog=
#SrunEpilog=
#TaskProlog=
#TaskEpilog=
#TaskPlugin=
#TrackWCKey=no
#TreeWidth=50
#TmpFS=
#UsePAM=
#UsePAM=1
#
# TIMERS
SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
#
# SCHEDULING
SchedulerType=sched/backfill
#SchedulerAuth=
#SchedulerPort=
#SchedulerRootFilter=
#SelectType=select/linear
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory,CR_CORE_DEFAULT_DIST_BLOCK
FastSchedule=1
PriorityType=priority/multifactor
#PriorityDecayHalfLife=14-0
#PriorityUsageResetPeriod=14-0
PriorityWeightFairshare=0
PriorityWeightAge=0
PriorityWeightPartition=0
PriorityWeightJobSize=0
PriorityWeightQOS=1000
#PriorityMaxAge=1-0
#
# LOGGING
SlurmctldDebug=5
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdDebug=5
SlurmdLogFile=/var/log/slurm/slurmd.log
JobCompType=jobcomp/none
#JobCompLoc=
#
# ACCOUNTING
#JobAcctGatherType=jobacct_gather/linux
#JobAcctGatherFrequency=30
#
#AccountingStorageType=accounting_storage/slurmdbd
##AccountingStorageHost=slurm-server
#AccountingStorageLoc=
#AccountingStoragePass=
#AccountingStorageUser=
#

AccountingStorageEnforce=qos
AccountingStorageLoc=slurm_acct_db
AccountingStorageType=accounting_storage/slurmdbd
AccountingStoragePort=8544
AccountingStorageUser=root
#AccountingStoragePass=slurm
AccountingStorageHost=slurm-server
# ACCT_GATHER
JobAcctGatherType=jobacct_gather/linux
JobAcctGatherFrequency=60
#AcctGatherEnergyType=acct_gather_energy/rapl
#AcctGatherNodeFreq=30

#Memoria
#DefMemPerCPU=1024 # 1GB
#MaxMemPerCPU=3072 # 3GB
# COMPUTE NODES
NodeName=foner[11-14] Procs=20 RealMemory= 258126 Sockets=2
CoresPerSocket=10 ThreadsPerCore=1 State=UNKNOWN

NodeName=foner[101-142] CPUs=20 Sockets=2 CoresPerSocket=10
ThreadsPerCore=1 RealMemory=64398 State=UNKNOWN

PartitionName=thin Nodes=foner[103-142] Shared=NO PreemptMode=CANCEL
State=UP MaxTime=4320 MinNodes=2
PartitionName=thin_test Nodes=foner[101,102] Default=YES Shared=NO
PreemptMode=CANCEL State=UP MaxTime=60 MaxNodes=1
PartitionName=fat Nodes=foner[11-14] Shared=NO PreemptMode=CANCEL
State=UP MaxTime=4320 MaxNodes=1

##################END SLURM.CONF#######################
 -- 
   
     Joan Francesc Arbona
     Ext. 2582
     Centre de Tecnologies de la Informació
     Universitat de les Illes Balears
     
     http://jfdeu.wordpress.com
     http://guifisoller.wordpress.com

[slurm-dev] Re: Jobstep distribution among nodes

Reply via email to