[slurm-users] PMIx + openMPI with heterogeneous jobs

2023-05-24 Thread Bertini, Denis Dr.
I am facing the same problem that was quoted long ago (2019) in this mailing mailing reference: https://lists.schedmd.com/pipermail/slurm-users/2019-July/003785.html but with more recent version of slurm i.e: slurm 21.08.8-2 PMIx 2.2.5 (pmix-2.2.5-1.el8.src.rpm) openMPI 4.1.5 In a similar

[slurm-users] Restrictions for new/inefficient users?

2023-05-24 Thread Loris Bennett
Hi, We have the problem that increasing numbers of new users have little to no idea about the amount of resources their programs can use efficiently. Thus, they will often just request 32 cores, because that's what most of our nodes have, and 128 or 256 GB, for reasons which are unclear to me, ev

[slurm-users] hi-priority partition and preemption

2023-05-24 Thread Fabrizio Roccato
Hi all, i'm trying to have two overlapping partition, say normal and hi-pri, so that when jobs are launched in the second one they can preempt the jobs allready running in the first one, automatically putting them in suspend state. After completition, the jobs in the normal partition must b

Re: [slurm-users] hi-priority partition and preemption

2023-05-24 Thread Loris Bennett
Hi Fabrizio, Fabrizio Roccato writes: > Hi all, > i'm trying to have two overlapping partition, say normal and hi-pri, > so that when jobs are launched in the second one they can preempt the jobs > allready running in the first one, automatically putting them in suspend > state. After comp

[slurm-users] slurmstepd error after upgrade to 23.02

2023-05-24 Thread Hagdorn, Magnus Karl Moritz
Hi all, we have recently upgraded slurm to 23.02. Since then we are getting the following error in our logs May 21 03:23:27 s-sc-gpu001 slurmstepd[2723991]: error: slurm_send_node_msg: hash_g_compute: REQUEST_STEP_COMPLETE has error May 21 03:24:27 s-sc-gpu001 slurmstepd[2723991]: error: hash_g_co

Re: [slurm-users] hi-priority partition and preemption

2023-05-24 Thread Groner, Rob
What you are describing is definitely doable. We have our system setup similarly. All nodes are in the "open" partition and "prio" partition, but a job submitted to the "prio" partition will preempt the open jobs. I don't see anything clearly wrong with your slurm.conf settings. Ours are ver

[slurm-users] Usage gathering for GPUs

2023-05-24 Thread Fulton, Ben
Hi, The release notes for 23.02 say "Added usage gathering for gpu/nvml (Nvidia) and gpu/rsmi (AMD) plugins". How would I go about enabling this? Thanks! -- Ben Fulton Research Applications and Deep Learning Research Technologies Indiana University

Re: [slurm-users] Usage gathering for GPUs

2023-05-24 Thread Christopher Samuel
On 5/24/23 11:39 am, Fulton, Ben wrote: Hi, Hi Ben, The release notes for 23.02 say “Added usage gathering for gpu/nvml (Nvidia) and gpu/rsmi (AMD) plugins”. How would I go about enabling this? I can only comment on the nvidia side (as those are the GPUs we have) but for that you need S