Dear Brian, Currently, we have 5 GPUs available (out of 8).
rpcruz@atlas:~$ /usr/bin/srun --gres=shard:2 ls srun: job 515 queued and waiting for resources The job shows as PD in squeue. scontrol says that 5 GPUs are allocated out of 8... rpcruz@atlas:~$ scontrol show node compute01 NodeName=compute01 Arch=x86_64 CoresPerSocket=32 CPUAlloc=80 CPUEfctv=128 CPUTot=128 CPULoad=65.38 AvailableFeatures=(null) ActiveFeatures=(null) * Gres=gpu:8,shard:32* NodeAddr=compute01 NodeHostName=compute01 Version=23.11.4 OS=Linux 6.8.0-36-generic #36-Ubuntu SMP PREEMPT_DYNAMIC Mon Jun 10 10:49:14 UTC 2024 RealMemory=1031887 AllocMem=644925 FreeMem=701146 Sockets=2 Boards=1 State=MIXED ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A Partitions=partition BootTime=2024-07-02T14:08:37 SlurmdStartTime=2024-07-02T14:08:51 LastBusyTime=2024-07-03T12:02:11 ResumeAfterTime=None * CfgTRES=cpu=128,mem=1031887M,billing=128,gres/gpu=8 AllocTRES=cpu=80,mem=644925M,gres/gpu=5* CapWatts=n/a CurrentWatts=0 AveWatts=0 ExtSensorsJoules=n/a ExtSensorsWatts=0 ExtSensorsTemp=n/a rpcruz@atlas:~$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST partition* up 5-00:00:00 1 mix compute01 The output is the same, independent of whether "srun --gres=shard:2" is pending or not. I wonder if the problem is that CfgTRES is not showing gres/shard ... it sounds like it should, right? The complete last part of my /etc/slurm/slurm.conf (which is of course the same in the login and compute node): # COMPUTE NODES GresTypes=gpu,shard NodeName=compute01 Gres=gpu:8,shard:32 CPUs=128 RealMemory=1031887 Sockets=2 CoresPerSocket=32 ThreadsPerCore=2 State=UNKNOWN PartitionName=partition Nodes=ALL Default=YES MaxTime=5-00:00:00 State=UP DefCpuPerGPU=16 DefMemPerGPU=128985 And in the compute node /etc/slurm/gres.conf is: Name=gpu File=/dev/nvidia[0-7] Name=shard Count=32 Thank you! -- Ricardo Cruz - https://rpmcruz.github.io Brian Andrus via slurm-users <slurm-users@lists.schedmd.com> escreveu (quinta, 4/07/2024 à(s) 17:16): > To help dig into it, can you paste the full output of scontrol show node > compute01 while the job is pending? Also 'sinfo' would be good. > > It is basically telling you there aren't enough resources in the partition > to run the job. Often this is because all the nodes are in use at that > moment. > > Brian Andrus > On 7/4/2024 8:43 AM, Ricardo Cruz via slurm-users wrote: > > Greetings, > > There are not many questions regarding GPU sharding here, and I am unsure > if I am using it correctly... I have configured it according to the > instructions <https://slurm.schedmd.com/gres.html>, and it seems to be > configured properly: > > $ scontrol show node compute01 > NodeName=compute01 Arch=x86_64 CoresPerSocket=32 > CPUAlloc=48 CPUEfctv=128 CPUTot=128 CPULoad=10.95 > AvailableFeatures=(null) > ActiveFeatures=(null) > > * Gres=gpu:8,shard:32 * > [truncated] > > When running with gres:gpu everything works perfectly: > > $ /usr/bin/srun --gres=gpu:2 ls > srun: job 192 queued and waiting for resources > srun: job 192 has been allocated resources > (...) > > However, when using sharding, it just stays waiting indefinitely: > > $ /usr/bin/srun --gres=shard:2 ls > srun: job 193 queued and waiting for resources > > The reason it gives for pending is just "Resources": > > $ scontrol show job 193 > JobId=193 JobName=ls > UserId=rpcruz(1000) GroupId=rpcruz(1000) MCS_label=N/A > Priority=1 Nice=0 Account=account QOS=normal > > * JobState=PENDING Reason=Resources Dependency=(null) * Requeue=1 > Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 > RunTime=00:00:00 TimeLimit=2-00:00:00 TimeMin=N/A > SubmitTime=2024-06-28T05:36:51 EligibleTime=2024-06-28T05:36:51 > AccrueTime=2024-06-28T05:36:51 > StartTime=2024-06-29T18:13:22 EndTime=2024-07-01T18:13:22 Deadline=N/A > SuspendTime=None SecsPreSuspend=0 LastSchedEval=2024-06-28T05:37:20 > Scheduler=Backfill:* > Partition=partition AllocNode:Sid=localhost:47757 > ReqNodeList=(null) ExcNodeList=(null) > NodeList= > NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:* > ReqTRES=cpu=1,mem=1031887M,node=1,billing=1 > AllocTRES=(null) > Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* > MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0 > Features=(null) DelayBoot=00:00:00 > OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) > Command=ls > WorkDir=/home/rpcruz > Power= > * TresPerNode=gres/shard:2* > > Again, I think I have configured it properly - it shows up correctly in > scontrol (as shown above). > Our setup is pretty simple - I just added shard to /etc/slurm/slurm.conf: > GresTypes=gpu,shard > NodeName=compute01 Gres=gpu:8,shard:32 [truncated] > Our /etc/slurm/gres.conf is also straight-forward: (it works fine for > --gres=gpu:1) > Name=gpu File=/dev/nvidia[0-7] > Name=shard Count=32 > > > Maybe I am just running srun improperly? Shouldn't it just be srun --gres= > shard:2 to allocate half of a GPU? (since I am using 32 shards for the 8 > gpus, so it's 4 shards per gpu) > > Thank you very much for your attention, > -- > Ricardo Cruz - https://rpmcruz.github.io > > > -- > slurm-users mailing list -- slurm-users@lists.schedmd.com > To unsubscribe send an email to slurm-users-le...@lists.schedmd.com >
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com