I'm trying to get slurm and openmpi to cooperate when running multi thread jobs. i'm sure i'm doing something wrong, but i can't figure out what
my node configuration is 2 nodes 2 sockets 6 cores per socket i want to run sbatch -N2 -n 8 --ntasks-per-node=4 --cpus-per-task=3 -w node1,node2 program.sbatch inside the program.sbatch i'm calling openmpi mpirun -n $SLURM_NTASKS --report-bindings program when the binds report comes out i get node1 rank 0 socket 0 core 0 node1 rank 1 socket 1 core 6 node1 rank 2 socket 0 core 1 node1 rank 3 socket 1 core 7 node2 rank 4 socket 0 core 0 node2 rank 5 socket 1 core 6 node2 rank 6 socket 0 core 1 node2 rank 7 socket 1 core 7 which is semi-fine, but when the job runs the resulting threads from the program are locked (according to top) to those eight cores rather then spreading themselves over the 24 cores available i tried a few incantations of the map-by, bind-to, etc, but openmpi basically complained about everything i tried for one reason or another my understand is that slurm should be passing the requested config to openmpi (or openmpi is pulling from the environment somehow) and it should magically work if i skip slurm and run mpirun -n 8 --map-by node:pe=3 -bind-to core -host node1,node2 --report-bindings program node1 rank 0 socket 0 core 0 node2 rank 1 socket 0 core 0 node1 rank 2 socket 0 core 3 node2 rank 3 socket 0 core 3 node1 rank 4 socket 1 core 6 node2 rank 5 socket 1 core 6 node1 rank 6 socket 1 core 9 node2 rank 7 socket 1 core 9 i do get the behavior i want (though i would prefer a -npernode switch in there, but openmpi complains). the bindings look better and the threads are not locked to the particular cores therefore i'm pretty sure this is a problem between openmpi and slurm and not necessarily with either individually i did compile openmpi with the slurm support switch and we're using the cgroups taskplugin within slurm i guess ancillary to this, is there a way to turn off core binding/placement routines and control the placement manually?