Re: [slurm-users] Questions about SLURM configure options

2023-04-25 Thread Paul H. Hargrove
Elliot, The proper use of `--with-pmix` is documented at https://slurm.schedmd.com/mpi_guide.html I don't know anything about your other questions. -Paul On Tue, Apr 25, 2023 at 8:52 PM Elliott Slaughter wrote: > Oh, and also, does it matter if hwloc itself is built against CUDA/NVML or > not

Re: [slurm-users] Questions about SLURM configure options

2023-04-25 Thread Elliott Slaughter
Oh, and also, does it matter if hwloc itself is built against CUDA/NVML or not? Will SLURM take advantage of that functionality if available, and if so what capabilities does it add? On Tue, Apr 25, 2023 at 8:28 PM Elliott Slaughter wrote: > Hi, > > I have some questions about SLURM configuratio

[slurm-users] Questions about SLURM configure options

2023-04-25 Thread Elliott Slaughter
Hi, I have some questions about SLURM configuration options: --with-pmix I was confused about this because I thought that SLURM had its own first-party PMIx implementation, but I can't see a configuration option to control it. (And also, building without this option does not appear to generate a

Re: [slurm-users] Terminating Jobs based on GrpTRESMins

2023-04-25 Thread Hoot Thompson
So Ole, any thoughts on the config info I sent? I’m still not certain if terminating a running job based on GrpTRESMins is even possible or supposed to work. Hoot > On Apr 24, 2023, at 3:21 PM, Hoot Thompson wrote: > > See below…... > >> On Apr 24, 2023, at 1:55 PM, Ole Holm Nielsen >> w

[slurm-users] scanceling a job puts the node in a draining state

2023-04-25 Thread Patrick Goetz
Hi - This was a known bug: https://bugs.schedmd.com/show_bug.cgi?id=3941 However, the bug report says this was fixed in version 17.02.7. The problem is we're running version 17.11.2, but appear to still have this bug going on: [2023-04-18T17:09:42.482] _slurm_rpc_kill_job: REQUEST_KILL_JOB

Re: [slurm-users] unable to kill namd3 process

2023-04-25 Thread Shaghuf Rahman
Hi, Also forgot to mention the process is still running when user do scancel and epilog does not clean if one job finished when doing multiple job submission. We tried to use unkillable option but did not work. The process still remains the same until killing it manually. On Tue, 25 Apr 2023 at

[slurm-users] unable to kill namd3 process

2023-04-25 Thread Shaghuf Rahman
Hi, We are facing one issue in my environment and the behaviour looks strange to me. It is specifically associated with the namd3 application. The issue is narrated below and I have made some of the cases. I am trying to understand the way to kill the processes of the namd3 application submitted