Hi Ralph,
I haven't played around in this code, so I'll flip the question
over to the Slurm list, and report back here when I learn
anything.
Cheers
Andy
Sigh - of course it wouldn’t be simple :-(
All right, let’s suppose we look for SLURM_CPU_BIND:
* if it includes the word “none”, then we know the
user specified that they don’t want us to bind
* if it includes the word mask_cpu, then we have to
check the value of that option.
* If it is all F’s, then they didn’t specify a
binding and we should do our thing.
* If it is anything else, then we assume they
_did_ specify a binding, and we leave it alone
Would that make sense? Is there anything else that
could be in that envar which would trip us up?
Yes, they still exist:
$ srun --ntasks-per-node=2 -N1 env | grep
BIND | sort -u
SLURM_CPU_BIND_LIST=0xFFFF
SLURM_CPU_BIND=quiet,mask_cpu:0xFFFF
SLURM_CPU_BIND_TYPE=mask_cpu:
SLURM_CPU_BIND_VERBOSE=quiet
Here are the relevant Slurm configuration
options that could conceivably change the behavior
from system to system:
SelectType = select/cons_res
SelectTypeParameters = CR_CPU
And if there is no --cpu_bind on
the cmd line? Do these not exist?
Hi
Ralph,
I
think I've found the magic keys...
$ srun
--ntasks-per-node=2 -N1 --cpu_bind=none env
| grep BIND
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_TYPE=none
SLURM_CPU_BIND_LIST=
SLURM_CPU_BIND=quiet,none
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_TYPE=none
SLURM_CPU_BIND_LIST=
SLURM_CPU_BIND=quiet,none
$ srun
--ntasks-per-node=2 -N1 --cpu_bind=core env
| grep BIND
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_TYPE=mask_cpu:
SLURM_CPU_BIND_LIST=0x1111,0x2222
SLURM_CPU_BIND=quiet,mask_cpu:0x1111,0x2222
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_TYPE=mask_cpu:
SLURM_CPU_BIND_LIST=0x1111,0x2222
SLURM_CPU_BIND=quiet,mask_cpu:0x1111,0x2222
Andy
On
10/27/2016 11:57 AM, r...@open-mpi.org wrote:
Hey Andy
Is there a SLURM envar that would tell us
the binding option from the srun cmd line?
We automatically bind when direct launched
due to user complaints of poor performance
if we don’t. If the user specifies a
binding option, then we detect that we were
already bound and don’t do it.
However, if the user specifies that they not
be bound, then we think they simply
didn’t specify anything - and that
isn’t the case. If we can see
something that tells us “they
explicitly said not to do itâ€, then we
can avoid the situation.
Ralph
On Oct 27,
2016, at 8:48 AM, Andy Riebs <andy.ri...@hpe.com>
wrote:
Hi All,
We are running Open MPI version 1.10.2,
built with support for Slurm version
16.05.0. When a user specifies
"--cpu_bind=none", MPI tries to bind by
core, which segv's if there are more
processes than cores.
The user reports:
What I found is that
% srun --ntasks-per-node=8 --cpu_bind=none
\
env SHMEM_SYMMETRIC_HEAP_SIZE=1024M
bin/all2all.shmem.exe 0
will have the problem, but:
% srun --ntasks-per-node=8 --cpu_bind=none
\
env SHMEM_SYMMETRIC_HEAP_SIZE=1024M
./bindit.sh bin/all2all.shmem.exe 0
Will run as expected and print out the
usage message because I didn’t
provide the right arguments to the code.
So, it appears that the binding has
something to do with the issue. My binding
script is as follows:
% cat bindit.sh
#!/bin/bash
#echo SLURM_LOCALID=$SLURM_LOCALID
stride=1
if [ ! -z "$SLURM_LOCALID" ]; then
let bindCPU=$SLURM_LOCALID*$stride
exec numactl --membind=0
--physcpubind=$bindCPU $*
fi
$*
%
--
Andy Riebs
andy.ri...@hpe.com
Hewlett-Packard Enterprise
High Performance Computing Software
Engineering
+1 404 648 9024
My opinions are not necessarily those of
HPE
May the source be with you!
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users
mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
|