Hey Andy Is there a SLURM envar that would tell us the binding option from the srun cmd line? We automatically bind when direct launched due to user complaints of poor performance if we don’t. If the user specifies a binding option, then we detect that we were already bound and don’t do it.
However, if the user specifies that they not be bound, then we think they simply didn’t specify anything - and that isn’t the case. If we can see something that tells us “they explicitly said not to do it”, then we can avoid the situation. Ralph > On Oct 27, 2016, at 8:48 AM, Andy Riebs <andy.ri...@hpe.com> wrote: > > Hi All, > > We are running Open MPI version 1.10.2, built with support for Slurm version > 16.05.0. When a user specifies "--cpu_bind=none", MPI tries to bind by core, > which segv's if there are more processes than cores. > > The user reports: > > What I found is that > > % srun --ntasks-per-node=8 --cpu_bind=none \ > env SHMEM_SYMMETRIC_HEAP_SIZE=1024M bin/all2all.shmem.exe 0 > > will have the problem, but: > > % srun --ntasks-per-node=8 --cpu_bind=none \ > env SHMEM_SYMMETRIC_HEAP_SIZE=1024M ./bindit.sh bin/all2all.shmem.exe 0 > > Will run as expected and print out the usage message because I didn’t provide > the right arguments to the code. > > So, it appears that the binding has something to do with the issue. My > binding script is as follows: > > % cat bindit.sh > #!/bin/bash > > #echo SLURM_LOCALID=$SLURM_LOCALID > > stride=1 > > if [ ! -z "$SLURM_LOCALID" ]; then > let bindCPU=$SLURM_LOCALID*$stride > exec numactl --membind=0 --physcpubind=$bindCPU $* > fi > > $* > > % > > > -- > Andy Riebs > andy.ri...@hpe.com > Hewlett-Packard Enterprise > High Performance Computing Software Engineering > +1 404 648 9024 > My opinions are not necessarily those of HPE > May the source be with you! > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users