Hello,
I'm trying to understand behavior of fairshare factor. I set a munin
monitoring for several accounts and observe the changes in time, and
they're not clear for me.
A background:
My users are split into two groups: sfglab and faculty,
in sfglab every one are equal, and in faculty they a
Hello,
"Mccall, Kurt E. (MSFC-EV41)" writes:
> MPICH uses the PMI 1 interface by default, but for our 20.02.3 Slurm
> installation, “srun –mpi=list yields”
>
>
>
> $ srun --mpi=list
>
> srun: MPI types are...
>
> srun: cray_shasta
>
> srun: pmi2
>
> srun: none
>
>
>
> PMI 2 is there, but no
Hi To all slurm users,
We have the following issue: jobs with highest priority are pending
forever with "Resources" reason. More specifically, the jobs pending
forever ask for 2 full nodes but all other jobs from other users
(running or pending) need only a 1/4 of a node, then pending jobs ask
Hi Jeremy,
I had a similar behavior a long time ago, and I decided to set
SchedulerType=sched/builtin to empty X nodes of jobs and execute that
high-priority job requesting more than one node. It is not ideal, but the
cluster has low load, so a user that requests more than one node doesn't
delay t
Hi,
I'm compiling Slurm with ansible playbooks from NVIDIA deepops framework
(https://github.com/NVIDIA/deepops). I'm trying to add UCX support. How
can I tell if UCX is actually included in the resulting binaries
(without actually using Slurm)? I was looking at executables and *so
files with
I am not sure about the rest of the Slurm world, but since I will most likely
update OpenMPI more often than Slurm, I've configured and built OpenMPI with
UCX and Slurm support and I think they are both default unless you specify
"--without" option. Works great so far!
-Original Message
Hello,
A few weeks ago, we tested Slurm against about 50K jobs, and observed at
least one instance where a node went idle, while there were jobs on the
queue that could have run on the idle node. The best guess as to why this
occurred, at this point, is that the default_queue_depth was set to the
Not answering every question below, but for (1) we're at 200 on a cluster with
a few dozen nodes and around 1k cores, as per
https://lists.schedmd.com/pipermail/slurm-users/2021-June/007463.html -- there
may be other settings in that email that could be beneficial. We had a lot of
idle resource
hello,
a user has requested that we set MaxStepCount to "unlimited" or 16million to
accommodate some of their desired workflows. i searched around for details
about this parameter & don't see alot, and i reviewed
https://bugs.schedmd.com/show_bug.cgi?id=5722
any thoughts on this? can this suc
Am 12.01.22 um 17:54 schrieb Matthias Leopold:
Hi,
I'm compiling Slurm with ansible playbooks from NVIDIA deepops framework
(https://github.com/NVIDIA/deepops). I'm trying to add UCX support. How
can I tell if UCX is actually included in the resulting binaries
(without actually using Slurm
Hi! I have a problem with the enforcing the memory limits...
I'm using the cgroup to enforce the limits and i had expected that when
cgroup memory limits are reach the job is killed ..
instead i see in log a lot of oom-killer reports that act only a certain process
from cgroup ...
Did i missed
David Henkemeyer writes:
> 3) Is there a way to see the order of the jobs in the queue? Perhaps
> squeue lists the jobs in order?
squeue -S -p
Sort jobs in descending priority order.
--
B/H
signature.asc
Description: PGP signature
12 matches
Mail list logo