Re: [slurm-users] Cgroups and swap with 18.08.1?

2018-10-19 Thread John Hearns
After doing some Googling https://jvns.ca/blog/2017/02/17/mystery-swap/ Swapping is weird and confusing (Amen to that!) https://jvns.ca/blog/2016/12/03/how-much-memory-is-my-process-using-/ (interesting article) >From the Docker documentation, below. Bill - this is what you are seeing. Twice as

Re: [slurm-users] Resource sharing between different clusters

2018-10-19 Thread Chris Samuel
On Saturday, 20 October 2018 6:00:55 AM AEDT Cao, Lei wrote: > Yes but I was a little confused by it. Does the cluster being shared run its > own slurmctld and slurmds on its nodes, or it has to run multiple sets of > slurmds, each of which belongs to a cluster that is sharing it? My understandin

Re: [slurm-users] Cgroups and swap with 18.08.1?

2018-10-19 Thread Chris Samuel
On Tuesday, 16 October 2018 2:47:34 PM AEDT Bill Broadley wrote: > AllowedSwapSpace=0 > > So I expect jobs to not use swap. Turns out if I run a 3GB ram process with > sbatch --mem=1000 I just get a process that uses 1GB ram and 2GB of swap. That's intended. The manual page says: Allow

[slurm-users] requesting entire vs. partial nodes

2018-10-19 Thread Noam Bernstein
Hi - I have a slurm usage question that I haven't been able to figure out from the docs. We basically have two types of jobs - ones that require entire nodes, and ones that do not. An additional (minor) complication is that the nodes have hyperthreading enabled, but we want (usually) to use on

[slurm-users] Preemption: Not receiving signal

2018-10-19 Thread nico.faerber
Hi all, according to the SLURM documentation, SIGCONT and SIGTERM signals are sent twice to a job that is selected for preemption: “Once a job has been selected for preemption, its end time is set to the current time plus GraceTime. The job is immediately sent SIGCONT and SIGTERM signals in o

Re: [slurm-users] Resource sharing between different clusters

2018-10-19 Thread Cao, Lei
Yes but I was a little confused by it. Does the cluster being shared run its own slurmctld and slurmds on its nodes, or it has to run multiple sets of slurmds, each of which belongs to a cluster that is sharing it? Thanks, Ray From: slurm-users on behalf of B

Re: [slurm-users] Cgroups and swap with 18.08.1?

2018-10-19 Thread Bill Broadley
On 10/16/18 3:38 AM, Bjørn-Helge Mevik wrote: > Just a tip: Make sure that the kernel has support for constraining swap > space. I believe we once had to reinstall one of our clusters once > because we had forgotten to check that. I tried starting slurmd with -D -v -v -v and got: slurmd: debug:

Re: [slurm-users] Resource sharing between different clusters

2018-10-19 Thread Benjamin Redling
On 18/10/2018 18:16, Cao, Lei wrote: > I am pretty new to slurm so please bear with me. I have the following > scenario and I wonder if slurm currently supports this in someway. > > Let's say I have 3 clusters. Cluster1 and cluster2 run their own > slurmctld and slurmds(this is a hard re

Re: [slurm-users] Can frequent hold-release adversely affect slurm?

2018-10-19 Thread Daniel Letai
On 18/10/2018 20:34, Eli V wrote: On Thu, Oct 18, 2018 at 1:03 PM Daniel Letai wrote: Hello all, To solve a requirement where a large number of job arrays (~10k arrays, each with at most 8M elements) with same priority should be executed wit