[slurm-users] GPU Gres Type inconsistencies

2023-06-19 Thread Ben Roberts
Hi all, I'm trying to set up GPU Gres Types to correctly identify the installed hardware (generation and memory size). I'm using a mix of explicit configuration (to set a friendly type name) and autodetection (to handle the cores and links detection). I'm seeing two related issues which I don't

Re: [slurm-users] Aborting a job from inside the prolog

2023-06-19 Thread Gerhard Strangar
Alexander Grund wrote: > Our first approach with `scancel $SLURM_JOB_ID; exit 1` doesn't seem to > work as the (sbatch) job still gets re-queued. Try to exit with 0, because it's not your prolog that failed.

[slurm-users] PMix3 Plugin+ openMPI 4.1.5 broken for heterogenous jobs with SLURM v 21.08.8-2

2023-06-19 Thread Bertini, Denis Dr.
Hi I made some progress trying to understand the problem i reported some weeks ago: https://lists.schedmd.com/pipermail/slurm-users/2023-May/010027.html I noticed that the intermittent connection timeout that i am experiencing occurs only when using the tcp based direct connection to establi