I would try specifying cpus and mem just to be sure its not requesting 0/all.
Also, I was running into a weird issue when I had oversubscribe=yes:2 causing odd issues in my lab cluster when playing with shards, where they would go pending resources despite no alloc of gpu/shards. Once I reverted to my normal FORCE:1, it behaved as expected. Also may want to make sure there isn’t a job_submit script possibly intercepting gres requests? > On Jul 4, 2024, at 12:09 PM, Brian Andrus via slurm-users > <slurm-users@lists.schedmd.com> wrote: > > Just a thought. > > Try specifying some memory. It looks like the running jobs do that and by > default, if not specified it is "all the memory on the node", so it can't > start because some of it is taken. > > Brian Andrus > > On 7/4/2024 9:54 AM, Ricardo Cruz wrote: >> Dear Brian, >> >> Currently, we have 5 GPUs available (out of 8). >> >> rpcruz@atlas:~$ /usr/bin/srun --gres=shard:2 ls >> srun: job 515 queued and waiting for resources >> >> The job shows as PD in squeue. >> scontrol says that 5 GPUs are allocated out of 8... >> >> rpcruz@atlas:~$ scontrol show node compute01 >> NodeName=compute01 Arch=x86_64 CoresPerSocket=32 >> CPUAlloc=80 CPUEfctv=128 CPUTot=128 CPULoad=65.38 >> AvailableFeatures=(null) >> ActiveFeatures=(null) >> Gres=gpu:8,shard:32 >> NodeAddr=compute01 NodeHostName=compute01 Version=23.11.4 >> OS=Linux 6.8.0-36-generic #36-Ubuntu SMP PREEMPT_DYNAMIC Mon Jun 10 >> 10:49:14 UTC 2024 >> RealMemory=1031887 AllocMem=644925 FreeMem=701146 Sockets=2 Boards=1 >> State=MIXED ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A >> Partitions=partition >> BootTime=2024-07-02T14:08:37 SlurmdStartTime=2024-07-02T14:08:51 >> LastBusyTime=2024-07-03T12:02:11 ResumeAfterTime=None >> CfgTRES=cpu=128,mem=1031887M,billing=128,gres/gpu=8 >> AllocTRES=cpu=80,mem=644925M,gres/gpu=5 >> CapWatts=n/a >> CurrentWatts=0 AveWatts=0 >> ExtSensorsJoules=n/a ExtSensorsWatts=0 ExtSensorsTemp=n/a >> >> rpcruz@atlas:~$ sinfo >> PARTITION AVAIL TIMELIMIT NODES STATE NODELIST >> partition* up 5-00:00:00 1 mix compute01 >> >> >> The output is the same, independent of whether "srun --gres=shard:2" is >> pending or not. >> I wonder if the problem is that CfgTRES is not showing gres/shard ... it >> sounds like it should, right? >> >> The complete last part of my /etc/slurm/slurm.conf (which is of course the >> same in the login and compute node): >> >> # COMPUTE NODES >> GresTypes=gpu,shard >> NodeName=compute01 Gres=gpu:8,shard:32 CPUs=128 RealMemory=1031887 Sockets=2 >> CoresPerSocket=32 ThreadsPerCore=2 State=UNKNOWN >> PartitionName=partition Nodes=ALL Default=YES MaxTime=5-00:00:00 State=UP >> DefCpuPerGPU=16 DefMemPerGPU=128985 >> >> And in the compute node /etc/slurm/gres.conf is: >> Name=gpu File=/dev/nvidia[0-7] >> Name=shard Count=32 >> >> >> Thank you! >> -- >> Ricardo Cruz - https://rpmcruz.github.io >> <https://rpmcruz.github.io/> >> >> Brian Andrus via slurm-users <slurm-users@lists.schedmd.com >> <mailto:slurm-users@lists.schedmd.com>> escreveu (quinta, 4/07/2024 à(s) >> 17:16): >>> To help dig into it, can you paste the full output of scontrol show node >>> compute01 while the job is pending? Also 'sinfo' would be good. >>> >>> It is basically telling you there aren't enough resources in the partition >>> to run the job. Often this is because all the nodes are in use at that >>> moment. >>> >>> Brian Andrus >>> >>> On 7/4/2024 8:43 AM, Ricardo Cruz via slurm-users wrote: >>>> Greetings, >>>> >>>> There are not many questions regarding GPU sharding here, and I am unsure >>>> if I am using it correctly... I have configured it according to the >>>> instructions <https://slurm.schedmd.com/gres.html>, and it seems to be >>>> configured properly: >>>> >>>> $ scontrol show node compute01 >>>> NodeName=compute01 Arch=x86_64 CoresPerSocket=32 >>>> CPUAlloc=48 CPUEfctv=128 CPUTot=128 CPULoad=10.95 >>>> AvailableFeatures=(null) >>>> ActiveFeatures=(null) >>>> Gres=gpu:8,shard:32 >>>> [truncated] >>>> >>>> When running with gres:gpu everything works perfectly: >>>> >>>> $ /usr/bin/srun --gres=gpu:2 ls >>>> srun: job 192 queued and waiting for resources >>>> srun: job 192 has been allocated resources >>>> (...) >>>> >>>> However, when using sharding, it just stays waiting indefinitely: >>>> >>>> $ /usr/bin/srun --gres=shard:2 ls >>>> srun: job 193 queued and waiting for resources >>>> >>>> The reason it gives for pending is just "Resources": >>>> >>>> $ scontrol show job 193 >>>> JobId=193 JobName=ls >>>> UserId=rpcruz(1000) GroupId=rpcruz(1000) MCS_label=N/A >>>> Priority=1 Nice=0 Account=account QOS=normal >>>> JobState=PENDING Reason=Resources Dependency=(null) >>>> Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 >>>> RunTime=00:00:00 TimeLimit=2-00:00:00 TimeMin=N/A >>>> SubmitTime=2024-06-28T05:36:51 EligibleTime=2024-06-28T05:36:51 >>>> AccrueTime=2024-06-28T05:36:51 >>>> StartTime=2024-06-29T18:13:22 EndTime=2024-07-01T18:13:22 Deadline=N/A >>>> SuspendTime=None SecsPreSuspend=0 LastSchedEval=2024-06-28T05:37:20 >>>> Scheduler=Backfill:* >>>> Partition=partition AllocNode:Sid=localhost:47757 >>>> ReqNodeList=(null) ExcNodeList=(null) >>>> NodeList= >>>> NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:* >>>> ReqTRES=cpu=1,mem=1031887M,node=1,billing=1 >>>> AllocTRES=(null) >>>> Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* >>>> MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0 >>>> Features=(null) DelayBoot=00:00:00 >>>> OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) >>>> Command=ls >>>> WorkDir=/home/rpcruz >>>> Power= >>>> TresPerNode=gres/shard:2 >>>> >>>> Again, I think I have configured it properly - it shows up correctly in >>>> scontrol (as shown above). >>>> Our setup is pretty simple - I just added shard to /etc/slurm/slurm.conf: >>>> GresTypes=gpu,shard >>>> NodeName=compute01 Gres=gpu:8,shard:32 [truncated] >>>> Our /etc/slurm/gres.conf is also straight-forward: (it works fine for >>>> --gres=gpu:1) >>>> Name=gpu File=/dev/nvidia[0-7] >>>> Name=shard Count=32 >>>> >>>> >>>> Maybe I am just running srun improperly? Shouldn't it just be srun >>>> --gres=shard:2 to allocate half of a GPU? (since I am using 32 shards for >>>> the 8 gpus, so it's 4 shards per gpu) >>>> >>>> Thank you very much for your attention, >>>> -- >>>> Ricardo Cruz - https://rpmcruz.github.io >>>> <https://rpmcruz.github.io/> >>> >>> -- >>> slurm-users mailing list -- slurm-users@lists.schedmd.com >>> <mailto:slurm-users@lists.schedmd.com> >>> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com >>> <mailto:slurm-users-le...@lists.schedmd.com> > > -- > slurm-users mailing list -- slurm-users@lists.schedmd.com > To unsubscribe send an email to slurm-users-le...@lists.schedmd.com Reed
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com