Very interesting issue. I am guessing there might be a workaround: SInce oryx has 2 gpus instead, you can define both of them, but disable the GT 710? Does Slurm support this?
Best, Feng Best, Feng On Tue, Jun 27, 2023 at 9:54 AM Wilson, Steven M <ste...@purdue.edu> wrote: > > Hi, > > I manually configure the GPUs in our Slurm configuration (AutoDetect=off in > gres.conf) and everything works fine when all the GPUs in a node are > configured in gres.conf and available to Slurm. But we have some nodes where > a GPU is reserved for running the display and is specifically not configured > in gres.conf. In these cases, Slurm includes this unconfigured GPU and makes > it available to Slurm jobs. Using a simple Slurm job that executes > "nvidia-smi -L", it will display the unconfigured GPU along with as many > configured GPUs as requested by the job. > > For example, in a node configured with this line in slurm.conf: > NodeName=oryx CoreSpecCount=2 CPUs=8 RealMemory=64000 Gres=gpu:RTX2080TI:1 > and this line in gres.conf: > Nodename=oryx Name=gpu Type=RTX2080TI File=/dev/nvidia1 > I will get the following results from a job running "nvidia-smi -L" that > requested a single GPU: > GPU 0: NVIDIA GeForce GT 710 (UUID: > GPU-21fe15f0-d8b9-b39e-8ada-8c1c8fba8a1e) > GPU 1: NVIDIA GeForce RTX 2080 Ti (UUID: > GPU-0dc4da58-5026-6173-1156-c4559a268bf5) > > But in another node that has all GPUs configured in Slurm like this in > slurm.conf: > NodeName=beluga CoreSpecCount=1 CPUs=16 RealMemory=128500 > Gres=gpu:TITANX:2 > and this line in gres.conf: > Nodename=beluga Name=gpu Type=TITANX File=/dev/nvidia[0-1] > I get the expected results from the job running "nvidia-smi -L" that > requested a single GPU: > GPU 0: NVIDIA RTX A5500 (UUID: GPU-3754c069-799e-2027-9fbb-ff90e2e8e459) > > I'm running Slurm 22.05.5. > > Thanks in advance for any suggestions to help correct this problem! > > Steve