Re: [slurm-users] Running multi jobs on one CPU in parallel

Karl Lovink Tue, 14 Sep 2021 13:11:19 -0700

Hi Emre,

MAX_TASKS_PER_NODE is set to 512. Does this means I cannot run more than
512 jobs in parallel on one node? Or can I change MAX_TASKS_PER_NODE to
a higher value?
And recompile slurm.....


Regards,
Karl


On 14/09/2021 21:47, Emre Brookes wrote:
> *-O*, *--overcommit*
>    Overcommit resources. When applied to job allocation, only one CPU
>    is allocated to the job per node and options used to specify the
>    number of tasks per node, socket, core, etc. are ignored. When
>    applied to job step allocations (the *srun* command when executed
>    within an existing job allocation), this option can be used to
>    launch more than one task per CPU. Normally, *srun* will not
>    allocate more than one process per CPU. By specifying *--overcommit*
>    you are explicitly allowing more than one process per CPU. However
>    no more than *MAX_TASKS_PER_NODE* tasks are permitted to execute per
>    node. NOTE: *MAX_TASKS_PER_NODE* is defined in the file /slurm.h/
>    and is not a variable, it is set at Slurm build time.
> 
> I have used this successfully to run more jobs than cpus/cores avail.
> 
> -e.
> 
> 
> 
> Karl Lovink wrote:
>> Hello,
>>
>> I am in the process of setting up our SLURM environment. We want to use
>> SLURM during our DDoS exercises for dispatching DDoS attack scripts. We
>> need a lot of parallel running jobs on a total of 3 nodes.I can't get it
>> to run more than 128 jobs simultaneously. There are 128 cpu's in the
>> compute nodes.
>>
>> How can I ensure that I can run more jobs in parallel than there are
>> CPUs in the compute node?
>>
>> Thanks
>> Karl
>>
>>
>> My srun script is:
>> srun --exclusive --nodes 3 --ntasks 384 /ddos/demo/showproc.sh
>>
>> And my slurm.conf file:
>> ClusterName=ddos-cluster
>> ControlMachine=slurm
>> SlurmUser=ddos
>> SlurmctldPort=6817
>> SlurmdPort=6818
>> AuthType=auth/munge
>> StateSaveLocation=/opt/slurm/spool/ctld
>> SlurmdSpoolDir=/opt/slurm/spool/d
>> SwitchType=switch/none
>> MpiDefault=none
>> SlurmctldPidFile=/opt/slurm/run/.pid
>> SlurmdPidFile=/opt/slurm/run/slurmd.pid
>> ProctrackType=proctrack/pgid
>> PluginDir=/opt/slurm/lib/slurm
>> ReturnToService=2
>> TaskPlugin=task/none
>> SlurmctldTimeout=300
>> SlurmdTimeout=300
>> InactiveLimit=0
>> MinJobAge=300
>> KillWait=30
>> Waittime=0
>> SchedulerType=sched/backfill
>>
>> SelectType=select/cons_tres
>> SelectTypeParameters=CR_Core
>>
>> SlurmctldDebug=3
>> SlurmctldLogFile=/opt/slurm/log/slurmctld.log
>> SlurmdDebug=3
>> SlurmdLogFile=/opt/slurm/log/slurmd.log
>> JobCompType=jobcomp/none
>> JobAcctGatherType=jobacct_gather/none
>> AccountingStorageTRES=gres/gpu
>> DebugFlags=CPU_Bind,gres
>> AccountingStorageType=accounting_storage/slurmdbd
>> AccountingStorageHost=localhost
>> AccountingStoragePass=/var/run/munge/munge.socket.2
>> AccountingStorageUser=slurm
>> SlurmctldParameters=enable_configurable
>> GresTypes=gpu
>> DefMemPerNode=256000
>> NodeName=aivd CPUs=128 Boards=1 SocketsPerBoard=2 CoresPerSocket=16
>> ThreadsPerCore=4 RealMemory=261562 State=UNKNOWN
>> NodeName=mivd CPUs=128 Boards=1 SocketsPerBoard=2 CoresPerSocket=16
>> ThreadsPerCore=4 RealMemory=261562 State=UNKNOWN
>> NodeName=fiod CPUs=128 Boards=1 SocketsPerBoard=2 CoresPerSocket=16
>> ThreadsPerCore=4 RealMemory=261562 State=UNKNOWN
>> PartitionName=ddos Nodes=ALL Default=YES MaxTime=INFINITE State=UP
>> PartitionName=adhoc Nodes=ALL Default=YES MaxTime=INFINITE State=UP
>>
>> .
>>
>

Re: [slurm-users] Running multi jobs on one CPU in parallel

Reply via email to