Can you please send me the allocation request you made (so I can see what you 
specified on the cmd line), and the mpirun cmd line?

Thanks
Ralph

> On Oct 2, 2015, at 8:25 AM, Marcin Krotkiewski <marcin.krotkiew...@gmail.com> 
> wrote:
> 
> Hi,
> 
> I fail to make OpenMPI bind to cores correctly when running from within 
> SLURM-allocated CPU resources spread over a range of compute nodes in an 
> otherwise homogeneous cluster. I have found this thread
> 
> http://www.open-mpi.org/community/lists/users/2014/06/24682.php
> 
> and did try to use what Ralph suggested there (--hetero-nodes), but it does 
> not work (v. 1.10.0). When running with --report-bindings I get messages like
> 
> [compute-9-11.local:27571] MCW rank 10 is not bound (or bound to all 
> available processors)
> 
> for all ranks outside of my first physical compute node. Moreover, everything 
> works as expected if I ask SLURM to assign entire compute nodes. So it does 
> look like Ralph's diagnose presented in that thread is correct, just the 
> --hetero-nodes switch does not work for me.
> 
> I have written a short code that uses sched_getaffinity to print the 
> effective bindings: all MPI ranks except of those on the first node are bound 
> to all CPU cores allocated by SLURM.
> 
> Do I have to do something except of --hetero-nodes, or is this a problem that 
> needs further investigation?
> 
> Thanks a lot!
> 
> Marcin
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/10/27770.php

Reply via email to