I just ran into this issue. Specifically, SLURM looks for the NVML header file,
which comes with CUDA or DCGM, in addition to the library that comes with the
drivers. The check is at
https://github.com/SchedMD/slurm/blob/a763a008b7700321b51aad2e619deab00638a379/auxdir/x_ac_nvml.m4#L32.
Once you
Is it possible to dynamically change JobFileAppend/open-mode behavior? I’m
using EpilogSlurmctld to automatically requeue jobs that exit with a certain
code, and would like to have those append rather than overwrite, but it seems
blunt to set `JobFileAppend=1` and force people who want the defau
Hello all,
I’m trying to implement multiple “ephemeral” queues that allow general usage of
project-specific hardware, but with preemption. One partition would wait a
while before jobs are preempted, another where preemption occurs almost
immediately, using a very short time to emulate a short g