[slurm-dev] Re: Jobstep distribution among nodes

Joan Arbona Fri, 04 Apr 2014 04:57:08 -0700

Not working, just says that cannot use more nodes than requested:

srun: error: Unable to create job step: More processors requested than
permitted


Thanks

On 04/04/14 13:50, Mehdi Denou wrote:
> 
> Try with:
> srun -N 1 -n 25
> 
> On 04/04/2014 13:47, Joan Arbona wrote:
>> Excuse me, I confused "Nodes" with "Tasks". When I wrote "Nodes" in the
>> last e-mail I meant "tasks".
>>
>> Let me explain it again with an example:
>>
>> My cluster has 2 nodes with 20 processors/node. I want to allocate all
>> 40 processors and both nodes in sbatch. Then I have to execute a jobstep
>> with srun on a subset of 25 processors. I want SLURM to fill completely
>> the maximum number of nodes: That is, using all 20 processors of the
>> first node and 5 of the second one.
>>
>> If I execute an sbatch like this:
>> #!/bin/bash
>> [...]
>> #SBATCH --nodes=2
>> #SBATCH --ntasks=40
>> srun -n25 hostname
>>
>> Does not work and executes 12 hostname on the first node and 13 on the
>> second one, and should execute 20 hostname on the first one and 5 on the
>> second one.
>>
>>
>> Thanks and sorry for the confusion,
>> Joan
>>
>>
>>
>> On 04/04/14 13:22, Mehdi Denou wrote:
>>> It's a little bit confusing:
>>>
>>> When in sbatch I specify that I want to allocate 25 nodes and I execute
>>>
>>> So it means -N 25
>>> For example if you want to allocate 40 nodes and then execute srun on 25:
>>>
>>> #!/bin/bash
>>> #SBATCH -N 40
>>>
>>> srun -N 25 hostname
>>>
>>> -n is the number of task (the number of system process)
>>> -N or --nodes is the number of nodes.
>>>
>>> If you don't specify -n it's set to 1 by default.
>>>
>>> On 04/04/2014 11:24, Joan Arbona wrote:
>>>> Thanks for the answer. No luck anyway.
>>>> When in sbatch I specify that I want to allocate 25 nodes and I execute
>>>> srun without parameters it works. However, if I specify I want to
>>>> allocate 40 nodes and then I execute srun selecting only 25 of them it
>>>> does not work.
>>>>
>>>> That is:
>>>>
>>>> ---
>>>>
>>>> 1.
>>>> #!/bin/bash
>>>> [...]
>>>> #SBATCH --nodes=2
>>>> #SBATCH --ntasks=25
>>>>
>>>> srun hostname
>>>>
>>>> -> Works, but we don't want it because we need srun to select a subset
>>>> of the requested nodes.
>>>>
>>>> ---
>>>>
>>>> 2.
>>>> #!/bin/bash
>>>> [...]
>>>> #SBATCH --nodes=2
>>>> #SBATCH --ntasks=40
>>>>
>>>> srun -n25 hostname
>>>>
>>>> -> Doesn't work. Executes half of the processes on the first node and
>>>> the other half on the second. Also tried to remove --nodes=2.
>>>>
>>>> ---
>>>>
>>>> It seems that it's the way sbatch influences srun. Is there anyway to
>>>> see which parameters does the sbatch call transfers to srun?
>>>>
>>>> Thanks,
>>>> Joan
>>>>
>>>>
>>>>
>>>>
>>>> On 04/04/14 10:54, Mehdi Denou wrote:
>>>>> Hello,
>>>>>
>>>>> You should take a look at the parameter --mincpu
>>>>>
>>>>>
>>>>> On 04/04/2014 10:22, Joan Arbona wrote:
>>>>>> Hello all,
>>>>>>
>>>>>> We have a cluster with 40 nodes and 20 cores for node and we are trying
>>>>>> to distribute jobsteps executed with sbatch "in blocks". That means we
>>>>>> want to fill the maximum number of nodes and, if the number of tasks is
>>>>>> not multiple of 20, to have only one node without all cores busy. For
>>>>>> example, if we executed a task on 25 cores, we would have node 1 with
>>>>>> all 20 cores reserved and node 2 with only 5 cores reserved.
>>>>>>
>>>>>> If we execute
>>>>>> srun  -n25  -pthin hostname
>>>>>> works fine and produces the following output:
>>>>>>
>>>>>> foner118
>>>>>> foner118
>>>>>> foner118
>>>>>> foner118
>>>>>> foner118
>>>>>> foner118
>>>>>> foner118
>>>>>> foner118
>>>>>> foner118
>>>>>> foner118
>>>>>> foner118
>>>>>> foner118
>>>>>> foner118
>>>>>> foner118
>>>>>> foner118
>>>>>> foner118
>>>>>> foner118
>>>>>> foner118
>>>>>> foner118
>>>>>> foner118
>>>>>> foner119
>>>>>> foner119
>>>>>> foner119
>>>>>> foner119
>>>>>> foner119
>>>>>>
>>>>>>
>>>>>> However, when we execute this in a sbatch script it does not work at
>>>>>> all. I have tried it with all possible configurations I know and with
>>>>>> all useful parameters. Instead it executes 13 processes on the first
>>>>>> node and 12 processes on the second node.
>>>>>>
>>>>>> This is our sbatch script:
>>>>>> #!/bin/bash
>>>>>> #SBATCH --job-name=prova_joan
>>>>>> #SBATCH --partition=thin
>>>>>> #SBATCH --output=WRFJobName-%j.out
>>>>>> #SBATCH --error=WRFJobName-%j.err
>>>>>> #SBATCH --nodes=2
>>>>>> #SBATCH --ntasks=40
>>>>>>
>>>>>> srun  -n25 --exclusive  hostname &
>>>>>>
>>>>>> wait
>>>>>>
>>>>>> I have already tried to remove the --exclusive and the & without success.
>>>>>>
>>>>>> To sum up, the question is: What's the way to group tasks of jobsteps so
>>>>>> they fill as many nodes as possible with sbatch?
>>>>>>
>>>>>> Thanks,
>>>>>> Joan
>>>>>>
>>>>>>
>>>>>> PS: Attaching slurm.conf:
>>>>>>
>>>>>>
>>>>>> ##################BEGIN SLURM.CONF#######################
>>>>>> ClusterName=foner
>>>>>> ControlMachine=foner1,foner2
>>>>>> ControlAddr=slurm-server
>>>>>> #BackupController=
>>>>>> #BackupAddr=
>>>>>> #
>>>>>> SlurmUser=slurm
>>>>>> #SlurmdUser=root
>>>>>> SlurmctldPort=6817
>>>>>> SlurmdPort=6818
>>>>>> AuthType=auth/munge
>>>>>> CryptoType=crypto/munge
>>>>>> JobCredentialPrivateKey=/etc/slurm/private.key
>>>>>> JobCredentialPublicCertificate=/etc/slurm/public.key
>>>>>> StateSaveLocation=/SLURM
>>>>>> SlurmdSpoolDir=/var/log/slurm/spool_slurmd/
>>>>>> SwitchType=switch/none
>>>>>> MpiDefault=none
>>>>>> SlurmctldPidFile=/var/run/slurm/slurmctld.pid
>>>>>> SlurmdPidFile=/var/run/slurmd.pid
>>>>>> #ProctrackType=proctrack/pgid
>>>>>> ProctrackType=proctrack/linuxproc
>>>>>> TaskPlugin=task/affinity
>>>>>> TaskPluginParam=Cpusets
>>>>>> #PluginDir=
>>>>>> CacheGroups=0
>>>>>> #FirstJobId=
>>>>>> ReturnToService=0
>>>>>> #MaxJobCount=
>>>>>> #PlugStackConfig=
>>>>>> #PropagatePrioProcess=
>>>>>> #PropagateResourceLimits=
>>>>>> #PropagateResourceLimitsExcept=
>>>>>> #Prolog=/data/scripts/prolog_ctld.sh
>>>>>> #Prolog=
>>>>>> Epilog=/data/scripts/epilog.sh
>>>>>> #SrunProlog=
>>>>>> #SrunEpilog=
>>>>>> #TaskProlog=
>>>>>> #TaskEpilog=
>>>>>> #TaskPlugin=
>>>>>> #TrackWCKey=no
>>>>>> #TreeWidth=50
>>>>>> #TmpFS=
>>>>>> #UsePAM=
>>>>>> #UsePAM=1
>>>>>> #
>>>>>> # TIMERS
>>>>>> SlurmctldTimeout=300
>>>>>> SlurmdTimeout=300
>>>>>> InactiveLimit=0
>>>>>> MinJobAge=300
>>>>>> KillWait=30
>>>>>> Waittime=0
>>>>>> #
>>>>>> # SCHEDULING
>>>>>> SchedulerType=sched/backfill
>>>>>> #SchedulerAuth=
>>>>>> #SchedulerPort=
>>>>>> #SchedulerRootFilter=
>>>>>> #SelectType=select/linear
>>>>>> SelectType=select/cons_res
>>>>>> SelectTypeParameters=CR_Core_Memory,CR_CORE_DEFAULT_DIST_BLOCK
>>>>>> FastSchedule=1
>>>>>> PriorityType=priority/multifactor
>>>>>> #PriorityDecayHalfLife=14-0
>>>>>> #PriorityUsageResetPeriod=14-0
>>>>>> PriorityWeightFairshare=0
>>>>>> PriorityWeightAge=0
>>>>>> PriorityWeightPartition=0
>>>>>> PriorityWeightJobSize=0
>>>>>> PriorityWeightQOS=1000
>>>>>> #PriorityMaxAge=1-0
>>>>>> #
>>>>>> # LOGGING
>>>>>> SlurmctldDebug=5
>>>>>> SlurmctldLogFile=/var/log/slurm/slurmctld.log
>>>>>> SlurmdDebug=5
>>>>>> SlurmdLogFile=/var/log/slurm/slurmd.log
>>>>>> JobCompType=jobcomp/none
>>>>>> #JobCompLoc=
>>>>>> #
>>>>>> # ACCOUNTING
>>>>>> #JobAcctGatherType=jobacct_gather/linux
>>>>>> #JobAcctGatherFrequency=30
>>>>>> #
>>>>>> #AccountingStorageType=accounting_storage/slurmdbd
>>>>>> ##AccountingStorageHost=slurm-server
>>>>>> #AccountingStorageLoc=
>>>>>> #AccountingStoragePass=
>>>>>> #AccountingStorageUser=
>>>>>> #
>>>>>>
>>>>>> AccountingStorageEnforce=qos
>>>>>> AccountingStorageLoc=slurm_acct_db
>>>>>> AccountingStorageType=accounting_storage/slurmdbd
>>>>>> AccountingStoragePort=8544
>>>>>> AccountingStorageUser=root
>>>>>> #AccountingStoragePass=slurm
>>>>>> AccountingStorageHost=slurm-server
>>>>>> # ACCT_GATHER
>>>>>> JobAcctGatherType=jobacct_gather/linux
>>>>>> JobAcctGatherFrequency=60
>>>>>> #AcctGatherEnergyType=acct_gather_energy/rapl
>>>>>> #AcctGatherNodeFreq=30
>>>>>>
>>>>>> #Memoria
>>>>>> #DefMemPerCPU=1024 # 1GB
>>>>>> #MaxMemPerCPU=3072 # 3GB
>>>>>>
>>>>>>
>>>>>>
>>>>>> # COMPUTE NODES
>>>>>> NodeName=foner[11-14] Procs=20 RealMemory= 258126 Sockets=2
>>>>>> CoresPerSocket=10 ThreadsPerCore=1 State=UNKNOWN
>>>>>>
>>>>>> NodeName=foner[101-142] CPUs=20 Sockets=2 CoresPerSocket=10
>>>>>> ThreadsPerCore=1 RealMemory=64398 State=UNKNOWN
>>>>>>
>>>>>> PartitionName=thin Nodes=foner[103-142] Shared=NO PreemptMode=CANCEL
>>>>>> State=UP MaxTime=4320 MinNodes=2
>>>>>> PartitionName=thin_test Nodes=foner[101,102] Default=YES Shared=NO
>>>>>> PreemptMode=CANCEL State=UP MaxTime=60 MaxNodes=1
>>>>>> PartitionName=fat Nodes=foner[11-14] Shared=NO PreemptMode=CANCEL
>>>>>> State=UP MaxTime=4320 MaxNodes=1
>>>>>>
>>>>>> ##################END SLURM.CONF#######################
>>>>>>
>>
> 


-- 
Joan Francesc Arbona
Ext. 2582
Centre de Tecnologies de la Informació
Universitat de les Illes Balears

http://jfdeu.wordpress.com
http://guifisoller.wordpress.com

[slurm-dev] Re: Jobstep distribution among nodes

Reply via email to