Hi everyone,
My objective: I want to assign few tasks to the logical CPUs belong to a
particular socket(e.g., say socket 0) and at other time, I want to assign
another set of tasks to the logical CPUs belongs to another socket (e.g.,
say socket 0). In summary, I want to achieve task affinity to a
On Thu, Oct 26, 2017 at 1:39 PM, Kilian Cavalotti
wrote:
> and for a 4-GPU node which has a gres.conf like this (don't ask, some
> vendors like their CPU ids alternating between sockets):
>
> NodeName=sh-114-03 name=gpuFile=/dev/nvidia[0-1]
> CPUs=0,2,4,6,8,10,12,14,16,18
> NodeName=sh-114-
Hi Michael,
On Fri, Oct 27, 2017 at 4:44 AM, Michael Di Domenico
wrote:
> as an aside, is there some tool which provides the optimal mapping of
> CPU id's to GPU cards?
We use nvidia-smi:
-- 8<
-
# nvidia-
Hello,
hwloc 2 is to be released sooner or later, and it introduces a few API
changes. The build log for version slurm-llnl-17.02.7 of slurm shows:
../../../../../src/plugins/task/cgroup/task_cgroup_cpuset.c: In function
'_get_cpuinfo':
../../../../../src/plugins/task/cgroup/task_cgroup_cpuset.
Also, supposedly adding the "--accel-bind=g" option to srun will do this,
though we are observing that this is broken and causes jobs to hang.
Can anyone confirm this?
-Original Message-
From: Kilian Cavalotti [mailto:kilian.cavalotti.w...@gmail.com]
Sent: Friday, October 27, 2017 8:1
On Fri, Oct 27, 2017 at 12:45 PM, Dave Sizer wrote:
> Also, supposedly adding the "--accel-bind=g" option to srun will do this,
> though we are observing that this is broken and causes jobs to hang.
>
> Can anyone confirm this?
Not really, it doesn't seem to be hanging for us:
-- 8< --
Kilian, when you specify your CPU bindings in gres.conf, are you using the same
IDs that show up in nvidia-smi?
We noticed that our CPU IDs were being remapped from their nvidia-smi values by
SLURM according to hwloc, so to get affinity working we needed to use these
remapped values.
I'm wonde
I noticed crazy high numbers in my reports, things like sreport user top:
Top 10 Users 2017-10-20T00:00:00 - 2017-10-26T23:59:59 (604800 secs)
Use reported in Percentage of Total
Cluster Login Proper Name