Sorry to hear that. Hopefully others in the group have some ideas/explanations. I haven't had to deal with GPU resources in Slurm.
On Fri, Jan 13, 2023 at 4:51 AM Helder Daniel <hdan...@ualg.pt> wrote: > Oh, ok. > I guess I was expecting that the GPU job was suspended copying GPU memory > to RAM memory. > > I tried also: REQUEUE,GANG and CANCEL,GANG. > > None of these options seems to be able to preempt GPU jobs > > On Fri, 13 Jan 2023 at 12:30, Kevin Broch <kbr...@rivosinc.com> wrote: > >> My guess, is that this isn't possible with GANG,SUSPEND. GPU memory >> isn't managed in Slurm so the idea of suspending GPU memory for another job >> to use the rest simply isn't possible. >> >> On Fri, Jan 13, 2023 at 4:08 AM Helder Daniel <hdan...@ualg.pt> wrote: >> >>> Hi Kevin >>> >>> I did a "scontrol show partition". >>> Oversubscribe was not enabled. >>> I enable it in slurm.conf with: >>> >>> (...) >>> GresTypes=gpu >>> NodeName=asimov Gres=gpu:4 Sockets=1 CoresPerSocket=32 ThreadsPerCore=2 >>> State=UNKNOWN >>> PartitionName=asimov01 *OverSubscribe=FORCE* Nodes=asimov Default=YES >>> MaxTime=INFINITE MaxNodes=1 DefCpuPerGPU=2 State=UP >>> >>> but now it is working only with CPU jobs. It does not preempt gpu jobs. >>> Lauching 3 cpu only jobs, each requiring 32 out of 64 cores it preempt >>> after the timeslice as expected >>> >>> sbatch --cpus-per-task=32 test-cpu.sh >>> >>> JOBID PARTITION NAME USER ST TIME NODES >>> NODELIST(REASON) >>> 352 asimov01 cpu-only hdaniel R 0:58 1 >>> asimov >>> 353 asimov01 cpu-only hdaniel R 0:25 1 >>> asimov >>> 351 asimov01 cpu-only hdaniel S 0:36 1 >>> asimov >>> >>> But launching 3 GPU jobs, each requiring 2 out of 4 GPUs it does not >>> preempt the first 2 that start running. >>> It says that the 3rd job is hanging on resources. >>> >>> JOBID PARTITION NAME USER ST TIME NODES >>> NODELIST(REASON) >>> 356 asimov01 gpu hdaniel PD 0:00 1 >>> (Resources) >>> 354 asimov01 gpu hdaniel R 3:05 1 >>> asimov >>> 355 asimov01 gpu hdaniel R 3:02 1 >>> asimov >>> >>> Do I need to change anything else in the configuration to support also >>> gpu gang scheduling? >>> Thanks >>> >>> >>> ============================================================================ >>> scontrol show partition asimov01 >>> PartitionName=asimov01 >>> AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL >>> AllocNodes=ALL Default=YES QoS=N/A >>> DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 >>> Hidden=NO >>> MaxNodes=1 MaxTime=UNLIMITED MinNodes=0 LLN=NO >>> MaxCPUsPerNode=UNLIMITED >>> Nodes=asimov >>> PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO >>> OverSubscribe=NO >>> OverTimeLimit=NONE PreemptMode=GANG,SUSPEND >>> State=UP TotalCPUs=64 TotalNodes=1 SelectTypeParameters=NONE >>> JobDefaults=DefCpuPerGPU=2 >>> DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED >>> >>> On Fri, 13 Jan 2023 at 11:16, Kevin Broch <kbr...@rivosinc.com> wrote: >>> >>>> Problem might be that OverSubscribe is not enabled? w/o it, I don't >>>> believe the time-slicing can be GANG scheduled >>>> >>>> Can you do a "scontrol show partition" to verify that it is? >>>> >>>> On Thu, Jan 12, 2023 at 6:24 PM Helder Daniel <hdan...@ualg.pt> wrote: >>>> >>>>> Hi, >>>>> >>>>> I am trying to enable gang scheduling on a server with a CPU with 32 >>>>> cores and 4 GPUs. >>>>> >>>>> However, using Gang sched, the cpu jobs (or gpu jobs) are not being >>>>> preempted after the time slice, which is set to 30 secs. >>>>> >>>>> Below is a snapshot of squeue. There are 3 jobs each needing 32 cores. >>>>> The first 2 jobs launched are never preempted. The 3rd job is forever (or >>>>> at least until one of the other 2 ends) starving: >>>>> >>>>> JOBID PARTITION NAME USER ST TIME NODES >>>>> NODELIST(REASON) >>>>> 313 asimov01 cpu-only hdaniel PD 0:00 1 >>>>> (Resources) >>>>> 311 asimov01 cpu-only hdaniel R 1:52 1 >>>>> asimov >>>>> 312 asimov01 cpu-only hdaniel R 1:49 1 >>>>> asimov >>>>> >>>>> The same happens with GPU jobs. If I launch 5 jobs, requiring one GPU >>>>> each, the 5th job will never run. The preemption is not working with the >>>>> specified timeslice. >>>>> >>>>> I tried several combinations: >>>>> >>>>> SchedulerType=sched/builtin and backfill >>>>> SelectType=select/cons_tres and linear >>>>> >>>>> I'll appreciate any help and suggestions >>>>> The slurm.conf is below. >>>>> Thanks >>>>> >>>>> ClusterName=asimov >>>>> SlurmctldHost=localhost >>>>> MpiDefault=none >>>>> ProctrackType=proctrack/linuxproc # proctrack/cgroup >>>>> ReturnToService=2 >>>>> SlurmctldPidFile=/var/run/slurmctld.pid >>>>> SlurmctldPort=6817 >>>>> SlurmdPidFile=/var/run/slurmd.pid >>>>> SlurmdPort=6818 >>>>> SlurmdSpoolDir=/var/lib/slurm/slurmd >>>>> SlurmUser=slurm >>>>> StateSaveLocation=/var/lib/slurm/slurmctld >>>>> SwitchType=switch/none >>>>> TaskPlugin=task/none # task/cgroup >>>>> # >>>>> # TIMERS >>>>> InactiveLimit=0 >>>>> KillWait=30 >>>>> MinJobAge=300 >>>>> SlurmctldTimeout=120 >>>>> SlurmdTimeout=300 >>>>> Waittime=0 >>>>> # >>>>> # SCHEDULING >>>>> #FastSchedule=1 #obsolete >>>>> SchedulerType=sched/builtin #backfill >>>>> SelectType=select/cons_tres >>>>> SelectTypeParameters=CR_Core #CR_Core_Memory let's only one job run >>>>> at a time >>>>> PreemptType = preempt/partition_prio >>>>> PreemptMode = SUSPEND,GANG >>>>> SchedulerTimeSlice=30 #in seconds, default 30 >>>>> # >>>>> # LOGGING AND ACCOUNTING >>>>> #AccountingStoragePort= >>>>> AccountingStorageType=accounting_storage/none >>>>> #AccountingStorageEnforce=associations >>>>> #ClusterName=bip-cluster >>>>> JobAcctGatherFrequency=30 >>>>> JobAcctGatherType=jobacct_gather/linux >>>>> SlurmctldDebug=info >>>>> SlurmctldLogFile=/var/log/slurm/slurmctld.log >>>>> SlurmdDebug=info >>>>> SlurmdLogFile=/var/log/slurm/slurmd.log >>>>> # >>>>> # >>>>> # COMPUTE NODES >>>>> #NodeName=asimov CPUs=64 RealMemory=500 State=UNKNOWN >>>>> #PartitionName=LocalQ Nodes=ALL Default=YES MaxTime=INFINITE State=UP >>>>> >>>>> # Partitions >>>>> GresTypes=gpu >>>>> NodeName=asimov Gres=gpu:4 Sockets=1 CoresPerSocket=32 >>>>> ThreadsPerCore=2 State=UNKNOWN >>>>> PartitionName=asimov01 Nodes=asimov Default=YES MaxTime=INFINITE >>>>> MaxNodes=1 DefCpuPerGPU=2 State=UP >>>>> >>>>> >>> >>> -- >>> com os melhores cumprimentos, >>> >>> Helder Daniel >>> Universidade do Algarve >>> Faculdade de Ciências e Tecnologia >>> Departamento de Engenharia Electrónica e Informática >>> https://www.ualg.pt/pt/users/hdaniel >>> >> > > -- > com os melhores cumprimentos, > > Helder Daniel > Universidade do Algarve > Faculdade de Ciências e Tecnologia > Departamento de Engenharia Electrónica e Informática > https://www.ualg.pt/pt/users/hdaniel >