PS: I checked the resources while running the 3 GPU jobs which where launched with:
sbatch --gpus-per-task=2 --cpus-per-task=1 cnn-multi.sh The server have 64 cores (32 x2 with hyperthreading) cat /proc/cpuinfo | grep processor | tail -n1 processor : 63 128 GB main memory: hdaniel@asimov:~/Works/Turbines/02-CNN$ cat /proc/meminfo MemTotal: 131725276 kB MemFree: 106773356 kB MemAvailable: 109398780 kB Buffers: 161012 kB (...) And 4 GPUs each with 16GB memory: +-----------------------------------------------------------------------------+ | NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA RTX A4000 On | 00000000:41:00.0 Off | Off | | 45% 63C P2 47W / 140W | 15370MiB / 16376MiB | 14% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA RTX A4000 On | 00000000:42:00.0 Off | Off | | 44% 63C P2 45W / 140W | 15370MiB / 16376MiB | 14% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 2 NVIDIA RTX A4000 On | 00000000:61:00.0 Off | Off | | 50% 68C P2 52W / 140W | 15370MiB / 16376MiB | 15% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 3 NVIDIA RTX A4000 On | 00000000:62:00.0 Off | Off | | 46% 64C P2 47W / 140W | 15370MiB / 16376MiB | 14% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 2146 G /usr/lib/xorg/Xorg 9MiB | | 0 N/A N/A 2472 G /usr/bin/gnome-shell 4MiB | | 0 N/A N/A 524228 C /bin/python 15352MiB | | 1 N/A N/A 2146 G /usr/lib/xorg/Xorg 4MiB | | 1 N/A N/A 524228 C /bin/python 15362MiB | | 2 N/A N/A 2146 G /usr/lib/xorg/Xorg 4MiB | | 2 N/A N/A 524226 C /bin/python 15362MiB | | 3 N/A N/A 2146 G /usr/lib/xorg/Xorg 4MiB | | 3 N/A N/A 524226 C /bin/python 15362MiB | +-----------------------------------------------------------------------------+ On Fri, 13 Jan 2023 at 12:08, Helder Daniel <hdan...@ualg.pt> wrote: > Hi Kevin > > I did a "scontrol show partition". > Oversubscribe was not enabled. > I enable it in slurm.conf with: > > (...) > GresTypes=gpu > NodeName=asimov Gres=gpu:4 Sockets=1 CoresPerSocket=32 ThreadsPerCore=2 > State=UNKNOWN > PartitionName=asimov01 *OverSubscribe=FORCE* Nodes=asimov Default=YES > MaxTime=INFINITE MaxNodes=1 DefCpuPerGPU=2 State=UP > > but now it is working only with CPU jobs. It does not preempt gpu jobs. > Lauching 3 cpu only jobs, each requiring 32 out of 64 cores it preempt > after the timeslice as expected > > sbatch --cpus-per-task=32 test-cpu.sh > > JOBID PARTITION NAME USER ST TIME NODES > NODELIST(REASON) > 352 asimov01 cpu-only hdaniel R 0:58 1 asimov > 353 asimov01 cpu-only hdaniel R 0:25 1 asimov > 351 asimov01 cpu-only hdaniel S 0:36 1 asimov > > But launching 3 GPU jobs, each requiring 2 out of 4 GPUs it does not > preempt the first 2 that start running. > It says that the 3rd job is hanging on resources. > > JOBID PARTITION NAME USER ST TIME NODES > NODELIST(REASON) > 356 asimov01 gpu hdaniel PD 0:00 1 > (Resources) > 354 asimov01 gpu hdaniel R 3:05 1 asimov > 355 asimov01 gpu hdaniel R 3:02 1 asimov > > Do I need to change anything else in the configuration to support also gpu > gang scheduling? > Thanks > > > ============================================================================ > scontrol show partition asimov01 > PartitionName=asimov01 > AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL > AllocNodes=ALL Default=YES QoS=N/A > DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 > Hidden=NO > MaxNodes=1 MaxTime=UNLIMITED MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED > Nodes=asimov > PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO > OverSubscribe=NO > OverTimeLimit=NONE PreemptMode=GANG,SUSPEND > State=UP TotalCPUs=64 TotalNodes=1 SelectTypeParameters=NONE > JobDefaults=DefCpuPerGPU=2 > DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED > > On Fri, 13 Jan 2023 at 11:16, Kevin Broch <kbr...@rivosinc.com> wrote: > >> Problem might be that OverSubscribe is not enabled? w/o it, I don't >> believe the time-slicing can be GANG scheduled >> >> Can you do a "scontrol show partition" to verify that it is? >> >> On Thu, Jan 12, 2023 at 6:24 PM Helder Daniel <hdan...@ualg.pt> wrote: >> >>> Hi, >>> >>> I am trying to enable gang scheduling on a server with a CPU with 32 >>> cores and 4 GPUs. >>> >>> However, using Gang sched, the cpu jobs (or gpu jobs) are not being >>> preempted after the time slice, which is set to 30 secs. >>> >>> Below is a snapshot of squeue. There are 3 jobs each needing 32 cores. >>> The first 2 jobs launched are never preempted. The 3rd job is forever (or >>> at least until one of the other 2 ends) starving: >>> >>> JOBID PARTITION NAME USER ST TIME NODES >>> NODELIST(REASON) >>> 313 asimov01 cpu-only hdaniel PD 0:00 1 >>> (Resources) >>> 311 asimov01 cpu-only hdaniel R 1:52 1 >>> asimov >>> 312 asimov01 cpu-only hdaniel R 1:49 1 >>> asimov >>> >>> The same happens with GPU jobs. If I launch 5 jobs, requiring one GPU >>> each, the 5th job will never run. The preemption is not working with the >>> specified timeslice. >>> >>> I tried several combinations: >>> >>> SchedulerType=sched/builtin and backfill >>> SelectType=select/cons_tres and linear >>> >>> I'll appreciate any help and suggestions >>> The slurm.conf is below. >>> Thanks >>> >>> ClusterName=asimov >>> SlurmctldHost=localhost >>> MpiDefault=none >>> ProctrackType=proctrack/linuxproc # proctrack/cgroup >>> ReturnToService=2 >>> SlurmctldPidFile=/var/run/slurmctld.pid >>> SlurmctldPort=6817 >>> SlurmdPidFile=/var/run/slurmd.pid >>> SlurmdPort=6818 >>> SlurmdSpoolDir=/var/lib/slurm/slurmd >>> SlurmUser=slurm >>> StateSaveLocation=/var/lib/slurm/slurmctld >>> SwitchType=switch/none >>> TaskPlugin=task/none # task/cgroup >>> # >>> # TIMERS >>> InactiveLimit=0 >>> KillWait=30 >>> MinJobAge=300 >>> SlurmctldTimeout=120 >>> SlurmdTimeout=300 >>> Waittime=0 >>> # >>> # SCHEDULING >>> #FastSchedule=1 #obsolete >>> SchedulerType=sched/builtin #backfill >>> SelectType=select/cons_tres >>> SelectTypeParameters=CR_Core #CR_Core_Memory let's only one job run >>> at a time >>> PreemptType = preempt/partition_prio >>> PreemptMode = SUSPEND,GANG >>> SchedulerTimeSlice=30 #in seconds, default 30 >>> # >>> # LOGGING AND ACCOUNTING >>> #AccountingStoragePort= >>> AccountingStorageType=accounting_storage/none >>> #AccountingStorageEnforce=associations >>> #ClusterName=bip-cluster >>> JobAcctGatherFrequency=30 >>> JobAcctGatherType=jobacct_gather/linux >>> SlurmctldDebug=info >>> SlurmctldLogFile=/var/log/slurm/slurmctld.log >>> SlurmdDebug=info >>> SlurmdLogFile=/var/log/slurm/slurmd.log >>> # >>> # >>> # COMPUTE NODES >>> #NodeName=asimov CPUs=64 RealMemory=500 State=UNKNOWN >>> #PartitionName=LocalQ Nodes=ALL Default=YES MaxTime=INFINITE State=UP >>> >>> # Partitions >>> GresTypes=gpu >>> NodeName=asimov Gres=gpu:4 Sockets=1 CoresPerSocket=32 ThreadsPerCore=2 >>> State=UNKNOWN >>> PartitionName=asimov01 Nodes=asimov Default=YES MaxTime=INFINITE >>> MaxNodes=1 DefCpuPerGPU=2 State=UP >>> >>> > > -- > com os melhores cumprimentos, > > Helder Daniel > Universidade do Algarve > Faculdade de Ciências e Tecnologia > Departamento de Engenharia Electrónica e Informática > https://www.ualg.pt/pt/users/hdaniel > -- com os melhores cumprimentos, Helder Daniel Universidade do Algarve Faculdade de Ciências e Tecnologia Departamento de Engenharia Electrónica e Informática https://www.ualg.pt/pt/users/hdaniel