Hi Ralph,
I think I've found the magic keys...
$ srun --ntasks-per-node=2 -N1 --cpu_bind=none env | grep BIND
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_TYPE=none
SLURM_CPU_BIND_LIST=
SLURM_CPU_BIND=quiet,none
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_TYPE=none
SLURM_CPU_BIND_LIST=
SLURM_CPU_BIND=quiet,none
$ srun --ntasks-per-node=2 -N1 --cpu_bind=core env | grep BIND
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_TYPE=mask_cpu:
SLURM_CPU_BIND_LIST=0x1111,0x2222
SLURM_CPU_BIND=quiet,mask_cpu:0x1111,0x2222
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_TYPE=mask_cpu:
SLURM_CPU_BIND_LIST=0x1111,0x2222
SLURM_CPU_BIND=quiet,mask_cpu:0x1111,0x2222
Andy
On 10/27/2016 11:57 AM, r...@open-mpi.org wrote:
Hey Andy
Is there a SLURM envar that would tell us the binding option from the srun cmd
line? We automatically bind when direct launched due to user complaints of poor
performance if we don’t. If the user specifies a binding option, then we
detect that we were already bound and don’t do it.
However, if the user specifies that they not be bound, then we think they
simply didn’t specify anything - and that isn’t the case. If we can see
something that tells us “they explicitly said not to do itâ€, then we can
avoid the situation.
Ralph
On Oct 27, 2016, at 8:48 AM, Andy Riebs <andy.ri...@hpe.com> wrote:
Hi All,
We are running Open MPI version 1.10.2, built with support for Slurm version 16.05.0.
When a user specifies "--cpu_bind=none", MPI tries to bind by core, which
segv's if there are more processes than cores.
The user reports:
What I found is that
% srun --ntasks-per-node=8 --cpu_bind=none \
env SHMEM_SYMMETRIC_HEAP_SIZE=1024M bin/all2all.shmem.exe 0
will have the problem, but:
% srun --ntasks-per-node=8 --cpu_bind=none \
env SHMEM_SYMMETRIC_HEAP_SIZE=1024M ./bindit.sh bin/all2all.shmem.exe 0
Will run as expected and print out the usage message because I didn’t provide
the right arguments to the code.
So, it appears that the binding has something to do with the issue. My binding
script is as follows:
% cat bindit.sh
#!/bin/bash
#echo SLURM_LOCALID=$SLURM_LOCALID
stride=1
if [ ! -z "$SLURM_LOCALID" ]; then
let bindCPU=$SLURM_LOCALID*$stride
exec numactl --membind=0 --physcpubind=$bindCPU $*
fi
$*
%
--
Andy Riebs
andy.ri...@hpe.com
Hewlett-Packard Enterprise
High Performance Computing Software Engineering
+1 404 648 9024
My opinions are not necessarily those of HPE
May the source be with you!
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users