Can you please send me the allocation request you made (so I can see what you specified on the cmd line), and the mpirun cmd line?
Thanks Ralph > On Oct 2, 2015, at 8:25 AM, Marcin Krotkiewski <marcin.krotkiew...@gmail.com> > wrote: > > Hi, > > I fail to make OpenMPI bind to cores correctly when running from within > SLURM-allocated CPU resources spread over a range of compute nodes in an > otherwise homogeneous cluster. I have found this thread > > http://www.open-mpi.org/community/lists/users/2014/06/24682.php > > and did try to use what Ralph suggested there (--hetero-nodes), but it does > not work (v. 1.10.0). When running with --report-bindings I get messages like > > [compute-9-11.local:27571] MCW rank 10 is not bound (or bound to all > available processors) > > for all ranks outside of my first physical compute node. Moreover, everything > works as expected if I ask SLURM to assign entire compute nodes. So it does > look like Ralph's diagnose presented in that thread is correct, just the > --hetero-nodes switch does not work for me. > > I have written a short code that uses sched_getaffinity to print the > effective bindings: all MPI ranks except of those on the first node are bound > to all CPU cores allocated by SLURM. > > Do I have to do something except of --hetero-nodes, or is this a problem that > needs further investigation? > > Thanks a lot! > > Marcin > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/10/27770.php