You are welcome Loris! — Christopher Coffey High-Performance Computing Northern Arizona University 928-523-1167
On 2/26/19, 8:16 AM, "slurm-users on behalf of Loris Bennett" <slurm-users-boun...@lists.schedmd.com on behalf of loris.benn...@fu-berlin.de> wrote: Hi Chris, I had JobAcctGatherType=jobacct_gather/linux TaskPlugin=task/affinity ProctrackType=proctrack/cgroup ProctrackType was actually unset but cgroup is the default. I have now changed the settings to JobAcctGatherType=jobacct_gather/cgroup TaskPlugin=task/affinity,task/cgroup ProctrackType=proctrack/cgroup and added TaskAffinity=no ConstrainCores=yes ConstrainRAMSpace=yes For at least one job this gives me the following for a running job: $ seff -d 4896 Slurm data: JobID ArrayJobID User Group State Clustername Ncpus Nnodes Ntasks Reqmem PerNode Cput Walltime Mem ExitStatus Slurm data: 4896 loris sc RUNNING curta 8 2 2 2097152 0 0 33 3.6028797018964e+16 0 Job ID: 4896 Cluster: curta User/Group: loris/sc State: RUNNING Nodes: 2 Cores per node: 4 CPU Utilized: 00:00:00 CPU Efficiency: 0.00% of 00:04:24 core-walltime Job Wall-clock time: 00:00:33 Memory Utilized: 32.00 EB (estimated maximum) Memory Efficiency: 1717986918400.00% of 2.00 GB (256.00 MB/core) WARNING: Efficiency statistics may be misleading for RUNNING jobs. and this at completion: $ seff -d 4896 Slurm data: JobID ArrayJobID User Group State Clustername Ncpus Nnodes Ntasks Reqmem PerNode Cput Walltime Mem ExitStatus Slurm data: 4896 loris sc COMPLETED curta 8 2 2 2097152 0 0 61 59400 0 Job ID: 4896 Cluster: curta User/Group: loris/sc State: COMPLETED (exit code 0) Nodes: 2 Cores per node: 4 CPU Utilized: 00:00:00 CPU Efficiency: 0.00% of 00:08:08 core-walltime Job Wall-clock time: 00:01:01 Memory Utilized: 58.01 MB (estimated maximum) Memory Efficiency: 2.83% of 2.00 GB (256.00 MB/core) which looks good. I'll see how it goes with longer running job. Thanks for the input, Loris Christopher Benjamin Coffey <chris.cof...@nau.edu> writes: > Hi Loris, > > Odd, we never saw that issue with memory efficiency being out of whack, just the cpu efficiency. We are running 18.08.5-2 and here is a 512 core job run last night: > > Job ID: 18096693 > Array Job ID: 18096693_5 > Cluster: monsoon > User/Group: abc123/cluster > State: COMPLETED (exit code 0) > Nodes: 60 > Cores per node: 8 > CPU Utilized: 01:34:06 > CPU Efficiency: 58.04% of 02:42:08 core-walltime > Job Wall-clock time: 00:00:19 > Memory Utilized: 36.04 GB (estimated maximum) > Memory Efficiency: 30.76% of 117.19 GB (1.95 GB/node > > What job collection, task, and proc track plugin are you using I'm curious? We are using: > > JobAcctGatherType=jobacct_gather/cgroup > TaskPlugin=task/cgroup,task/affinity > ProctrackType=proctrack/cgroup > > Also cgroup.conf: > > ConstrainCores=yes > ConstrainRAMSpace=yes > > Best, > Chris > > — > Christopher Coffey > High-Performance Computing > Northern Arizona University > 928-523-1167 > > > On 2/26/19, 2:15 AM, "slurm-users on behalf of Loris Bennett" <slurm-users-boun...@lists.schedmd.com on behalf of loris.benn...@fu-berlin.de> wrote: > > Hi, > > With seff 18.08.5-2 we have been getting spurious results regarding > memory usage: > > $ seff 1230_27 > Job ID: 1234 > Array Job ID: 1230_27 > Cluster: curta > User/Group: xxxxxxxxx/xxxxxxxxx > State: COMPLETED (exit code 0) > Nodes: 4 > Cores per node: 25 > CPU Utilized: 9-16:49:18 > CPU Efficiency: 30.90% of 31-09:35:00 core-walltime > Job Wall-clock time: 07:32:09 > Memory Utilized: 48.00 EB (estimated maximum) > Memory Efficiency: 26388279066.62% of 195.31 GB (1.95 GB/core) > > It seems that the more cores are involved the worse the overcalulation > is, but not linearly. > > Has anyone else seen this? > > Cheers, > > Loris > > -- > Dr. Loris Bennett (Mr.) > ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de > > > -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de