[slurm-dev] srun is not assigning task to a particular logical CPU using slurm

2017-10-27 Thread Animesh Kuity
Hi everyone, My objective: I want to assign few tasks to the logical CPUs belong to a particular socket(e.g., say socket 0) and at other time, I want to assign another set of tasks to the logical CPUs belongs to another socket (e.g., say socket 0). In summary, I want to achieve task affinity to a

[slurm-dev] Re: CPU/GPU Affinity Not Working

2017-10-27 Thread Michael Di Domenico
On Thu, Oct 26, 2017 at 1:39 PM, Kilian Cavalotti wrote: > and for a 4-GPU node which has a gres.conf like this (don't ask, some > vendors like their CPU ids alternating between sockets): > > NodeName=sh-114-03 name=gpuFile=/dev/nvidia[0-1] > CPUs=0,2,4,6,8,10,12,14,16,18 > NodeName=sh-114-

[slurm-dev] Re: CPU/GPU Affinity Not Working

2017-10-27 Thread Kilian Cavalotti
Hi Michael, On Fri, Oct 27, 2017 at 4:44 AM, Michael Di Domenico wrote: > as an aside, is there some tool which provides the optimal mapping of > CPU id's to GPU cards? We use nvidia-smi: -- 8< - # nvidia-

[slurm-dev] slurm and hwloc 2

2017-10-27 Thread Samuel Thibault
Hello, hwloc 2 is to be released sooner or later, and it introduces a few API changes. The build log for version slurm-llnl-17.02.7 of slurm shows: ../../../../../src/plugins/task/cgroup/task_cgroup_cpuset.c: In function '_get_cpuinfo': ../../../../../src/plugins/task/cgroup/task_cgroup_cpuset.

[slurm-dev] Re: CPU/GPU Affinity Not Working

2017-10-27 Thread Dave Sizer
Also, supposedly adding the "--accel-bind=g" option to srun will do this, though we are observing that this is broken and causes jobs to hang. Can anyone confirm this? -Original Message- From: Kilian Cavalotti [mailto:kilian.cavalotti.w...@gmail.com] Sent: Friday, October 27, 2017 8:1

[slurm-dev] Re: CPU/GPU Affinity Not Working

2017-10-27 Thread Kilian Cavalotti
On Fri, Oct 27, 2017 at 12:45 PM, Dave Sizer wrote: > Also, supposedly adding the "--accel-bind=g" option to srun will do this, > though we are observing that this is broken and causes jobs to hang. > > Can anyone confirm this? Not really, it doesn't seem to be hanging for us: -- 8< --

[slurm-dev] Re: CPU/GPU Affinity Not Working

2017-10-27 Thread Dave Sizer
Kilian, when you specify your CPU bindings in gres.conf, are you using the same IDs that show up in nvidia-smi? We noticed that our CPU IDs were being remapped from their nvidia-smi values by SLURM according to hwloc, so to get affinity working we needed to use these remapped values. I'm wonde

[slurm-dev] Fixing corrupted slurm accounting?

2017-10-27 Thread Bill Broadley
I noticed crazy high numbers in my reports, things like sreport user top: Top 10 Users 2017-10-20T00:00:00 - 2017-10-26T23:59:59 (604800 secs) Use reported in Percentage of Total Cluster Login Proper Name