date:20220112

[slurm-users] Understanding fairshare factor

2022-01-12 Thread Michał Kadlof

Hello, I'm trying to understand behavior of fairshare factor. I set a munin monitoring for several accounts and observe the changes in time, and they're not clear for me. A background: My users are split into two groups: sfglab and faculty, in sfglab every one are equal, and in faculty they a

Re: [slurm-users] Slurm and MPICH

2022-01-12 Thread Roger Mason

Hello, "Mccall, Kurt E. (MSFC-EV41)" writes: > MPICH uses the PMI 1 interface by default, but for our 20.02.3 Slurm > installation, “srun –mpi=list yields” > > > > $ srun --mpi=list > > srun: MPI types are... > > srun: cray_shasta > > srun: pmi2 > > srun: none > > > > PMI 2 is there, but no

[slurm-users] Scheduler does not reserve resources

2022-01-12 Thread Jérémy Lapierre

Hi To all slurm users, We have the following issue: jobs with highest priority are pending forever with "Resources" reason. More specifically, the jobs pending forever ask for 2 full nodes but all other jobs from other users (running or pending) need only a 1/4 of a node, then pending jobs ask

Re: [slurm-users] Scheduler does not reserve resources

2022-01-12 Thread Rodrigo Santibáñez

Hi Jeremy, I had a similar behavior a long time ago, and I decided to set SchedulerType=sched/builtin to empty X nodes of jobs and execute that high-priority job requesting more than one node. It is not ideal, but the cluster has low load, so a user that requests more than one node doesn't delay t

[slurm-users] Building Slurm with UCX support

2022-01-12 Thread Matthias Leopold

Hi, I'm compiling Slurm with ansible playbooks from NVIDIA deepops framework (https://github.com/NVIDIA/deepops). I'm trying to add UCX support. How can I tell if UCX is actually included in the resulting binaries (without actually using Slurm)? I was looking at executables and *so files with

Re: [slurm-users] [EXT] Building Slurm with UCX support

2022-01-12 Thread Ozeryan, Vladimir

I am not sure about the rest of the Slurm world, but since I will most likely update OpenMPI more often than Slurm, I've configured and built OpenMPI with UCX and Slurm support and I think they are both default unless you specify "--without" option. Works great so far! -Original Message

[slurm-users] Questions about default_queue_depth

2022-01-12 Thread David Henkemeyer

Hello, A few weeks ago, we tested Slurm against about 50K jobs, and observed at least one instance where a node went idle, while there were jobs on the queue that could have run on the idle node. The best guess as to why this occurred, at this point, is that the default_queue_depth was set to the

Re: [slurm-users] Questions about default_queue_depth

2022-01-12 Thread Renfro, Michael

Not answering every question below, but for (1) we're at 200 on a cluster with a few dozen nodes and around 1k cores, as per https://lists.schedmd.com/pipermail/slurm-users/2021-June/007463.html -- there may be other settings in that email that could be beneficial. We had a lot of idle resource

[slurm-users] big increase of MaxStepCount?

2022-01-12 Thread John R Anderson

hello, a user has requested that we set MaxStepCount to "unlimited" or 16million to accommodate some of their desired workflows. i searched around for details about this parameter & don't see alot, and i reviewed https://bugs.schedmd.com/show_bug.cgi?id=5722 any thoughts on this? can this suc

Re: [slurm-users] Building Slurm with UCX support

2022-01-12 Thread Matthias Leopold

Am 12.01.22 um 17:54 schrieb Matthias Leopold: Hi, I'm compiling Slurm with ansible playbooks from NVIDIA deepops framework (https://github.com/NVIDIA/deepops). I'm trying to add UCX support. How can I tell if UCX is actually included in the resulting binaries (without actually using Slurm

[slurm-users] memory limits:: why job is not killed but oom-killer steps up?

2022-01-12 Thread Adrian Sevcenco

Hi! I have a problem with the enforcing the memory limits... I'm using the cgroup to enforce the limits and i had expected that when cgroup memory limits are reach the job is killed .. instead i see in log a lot of oom-killer reports that act only a certain process from cgroup ... Did i missed

Re: [slurm-users] Questions about default_queue_depth

2022-01-12 Thread Bjørn-Helge Mevik

David Henkemeyer writes: > 3) Is there a way to see the order of the jobs in the queue? Perhaps > squeue lists the jobs in order? squeue -S -p Sort jobs in descending priority order. -- B/H signature.asc Description: PGP signature

[slurm-users] Understanding fairshare factor

Re: [slurm-users] Slurm and MPICH

[slurm-users] Scheduler does not reserve resources

Re: [slurm-users] Scheduler does not reserve resources

[slurm-users] Building Slurm with UCX support

Re: [slurm-users] [EXT] Building Slurm with UCX support

[slurm-users] Questions about default_queue_depth

Re: [slurm-users] Questions about default_queue_depth

[slurm-users] big increase of MaxStepCount?

Re: [slurm-users] Building Slurm with UCX support

[slurm-users] memory limits:: why job is not killed but oom-killer steps up?

Re: [slurm-users] Questions about default_queue_depth

12 matches

Site Navigation

Mail list logo

Footer information