[slurm-users] GPU Gres Type inconsistencies

2023-06-19 Thread Ben Roberts
Hi all,

I'm trying to set up GPU Gres Types to correctly identify the installed 
hardware (generation and memory size). I'm using a mix of explicit 
configuration (to set a friendly type name) and autodetection (to handle the 
cores and links detection). I'm seeing two related issues which I don't 
understand.

  1.  The output of `scontrol show node` references `Gres=gpu:tesla:2` instead 
of the type I'm specifying in the config file (`v100s-pcie-32gb`)
  2.  Attempts to schedule jobs using generic `--gpus 1` are working fine, but 
attempts to specify the gpu type (either with `--gres gpu:v100s-pcie-32gb:1` or 
`--gres gpu:v100s-pcie-32gb:1` fail with `error: Unable to allocate resources: 
Requested node configuration is not available`

If I've understood the documentation 
(https://slurm.schedmd.com/gres.conf.html#OPT_Type), I should be able to use 
any substring of what nvml detects the card as (`tesla_v100s-pcie-32gb`) as the 
Type string. With gres debug flag set, I can see the GPUs are detected, and 
matched up with the static entries in gres.conf correctly. I don't see any 
mention of Type=tesla in the logs, so I'm at a loss as to why scontrol show 
node is reporting `gpu:tesla` instead of `gpu:v100s-pcie-32gb` as configured. I 
presume this mismatch is the cause of the failure to schedule, because while 
the job spec matches the configured gpu type and should be schedulable, the 
scheduler doesn't actually see any resources of this type available to run.

The "tesla" string is the first "word" of the autodetected type, but I can't 
see why it would be being truncated to just this rather than using the whole 
string. I did previously use the type "tesla" in the config, which worked fine 
since everything matched up, but since does not adequately describe the 
hardware so I need to change this to be more specific. Is there anywhere other 
than slurm.conf or gres.conf where the old gpu type might be persisted and need 
purging?

I've tried using `scontrol update node=gpu2 gres=gpu:v100s-pcie-32gb:0` to 
manually change the gres type (trying to set the number of GPUs to 2 here is 
rejected, but 0 is accepted). `scontrol reconfig` then causes the `scontrol 
show node` output to update to `Gres=vpu:v100s-pcie-32gb:2` as expected, but 
removes the gpus from CfgTRES. After restarting slurmctld, the Gres, and 
cfgTRES briefly match up for all nodes, but very shortly after the Gres entries 
revert back to Gres=gpu:tesla:0 again, so back to square 1.

I've tried using the full tesla_v100s-pcie-32gb string as the type also, but 
this has no effect, the gres type is still reported as gpu:tesla only. This is 
all with slurm 23.02.3, on Rocky Linux 8.8, using 
cuda-nvml-devel-12-0-12.0.140-1.x86_64. Excerpts from configs and logs shown 
below.

Can anyone point me in the right direction in how to solve this? Thanks,

# /etc/slurm/gres.conf
Name=gpu Type=v100s-pcie-32gb File=/dev/nvidia0
Name=gpu Type=v100s-pcie-32gb File=/dev/nvidia1
AutoDetect=nvml

# /etc/slurm/slurm.conf (identical on all nodes)
AccountingStorageTRES=gres/gpu,gres/gpu:v100s-pcie-32gb,gres/gpu:v100-pcie-32gb
EnforcePartLimits=ANY
GresTypes=gpu
NodeName=gpu2 CoresPerSocket=8 CPUs=8 Gres=gpu:v100s-pcie-32gb:2 Sockets=1 
ThreadsPerCore=1

# scontrol show node gpu2
NodeName=gpu2 Arch=x86_64 CoresPerSocket=8
   CPUAlloc=0 CPUEfctv=8 CPUTot=8 CPULoad=0.02
   AvailableFeatures=...
   Gres=gpu:tesla:0(S:0)
   NodeAddr=gpu2.example.com NodeHostName=gpu2 Version=23.02.3
   OS=Linux 4.18.0-477.13.1.el8_8.x86_64 #1 SMP Tue May 30 22:15:39 UTC 2023
   RealMemory=331301 AllocMem=0 FreeMem=334102 Sockets=1 Boards=1
   MemSpecLimit=500
   State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   Partitions=gpu
   BootTime=2023-06-14T23:03:05 SlurmdStartTime=2023-06-18T23:25:21
   LastBusyTime=2023-06-18T23:23:23 ResumeAfterTime=None
   CfgTRES=cpu=8,mem=331301M,billing=8,gres/gpu=2,gres/gpu:v100s-pcie-32gb=2
   AllocTRES=

# /var/log/slurm/slurmd.log (trimmed to only relevant lines for brevity)
[2023-06-19T11:29:25.629] GRES: Global AutoDetect=nvml(1)
[2023-06-19T11:29:25.629] debug:  gres/gpu: init: loaded
[2023-06-19T11:29:25.629] debug:  gpu/nvml: init: init: GPU NVML plugin loaded
[2023-06-19T11:29:26.265] debug2: gpu/nvml: _nvml_init: Successfully 
initialized NVML
[2023-06-19T11:29:26.265] debug:  gpu/nvml: _get_system_gpu_list_nvml: Systems 
Graphics Driver Version: 525.105.17
[2023-06-19T11:29:26.265] debug:  gpu/nvml: _get_system_gpu_list_nvml: NVML 
Library Version: 12.525.105.17
[2023-06-19T11:29:26.265] debug2: gpu/nvml: _get_system_gpu_list_nvml: NVML API 
Version: 11
[2023-06-19T11:29:26.265] debug2: gpu/nvml: _get_system_gpu_list_nvml: Total 
CPU count: 8
[2023-06-19T11:29:26.265] debug2: gpu/nvml: _get_system_gpu_list_nvml: Device 
count: 2
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml: GPU 
index 0:
[2023-06-19T11:29:26.302] debug2: gpu/nvml: _get_system_gpu_list_nvml: 
Name: tesla_v100s-pcie-32gb
[

Re: [slurm-users] Aborting a job from inside the prolog

2023-06-19 Thread Gerhard Strangar
Alexander Grund wrote:

> Our first approach with `scancel $SLURM_JOB_ID; exit 1` doesn't seem to 
> work as the (sbatch) job still gets re-queued.

Try to exit with 0, because it's not your prolog that failed.



[slurm-users] PMix3 Plugin+ openMPI 4.1.5 broken for heterogenous jobs with SLURM v 21.08.8-2

2023-06-19 Thread Bertini, Denis Dr.
Hi

I made some progress trying to understand the problem i reported some weeks ago:


https://lists.schedmd.com/pipermail/slurm-users/2023-May/010027.html


I noticed that the intermittent connection timeout that i am experiencing 
occurs only

when using the tcp based direct connection to establish communication between 
stepd

on different nodes.

When disabling the optimized direct connection using


export SLURM_PMIX_DIRECT_CONN=false


the submission of hetjobs is stable and not

connection timeout occurs anymore.

Any idea what can goes wrong when using tcp based direct connection together 
with hetjobs?

Cheers,
Denis

-
Denis Bertini
Abteilung: CIT
Ort: SB3 2.265a

Tel: +49 6159 71 2240
Fax: +49 6159 71 2986
E-Mail: d.bert...@gsi.de

GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
Chairman of the GSI Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
Ministerialdirigent Dr. Volkmar Dietz