Boris,

Open MPI should automatically detect the infiniband hardware, and use openib (and *not* tcp) for inter node communications

and a shared memory optimized btl (e.g. sm or vader) for intra node communications.


note if you "-mca btl openib,self", you tell Open MPI to use the openib btl between any tasks,

including tasks running on the same node (which is less efficient than using sm or vader)


at first, i suggest you make sure infiniband is up and running on all your nodes.

(just run ibstat, at least one port should be listed, state should be Active, and all nodes should have the same SM lid)


then try to run two tasks on two nodes.


if this does not work, you can

mpirun --mca btl_base_verbose 100 ...

and post the logs so we can investigate from there.


Cheers,


Gilles



On 7/14/2017 6:43 AM, Boris M. Vulovic wrote:

I would like to know how to invoke InfiniBand hardware on CentOS 6x cluster with OpenMPI (static libs.) for running my C++ code. This is how I compile and run:

/usr/local/open-mpi/1.10.7/bin/mpic++ -L/usr/local/open-mpi/1.10.7/lib -Bstatic main.cpp -o DoWork

usr/local/open-mpi/1.10.7/bin/mpiexec -mca btl tcp,self --hostfile hostfile5 -host node01,node02,node03,node04,node05 -n 200 DoWork

Here, "*-mca btl tcp,self*" reveals that *TCP* is used, and the cluster has InfiniBand.

What should be changed in compiling and running commands for InfiniBand to be invoked? If I just replace "*-mca btl tcp,self*" with "*-mca btl openib,self*" then I get plenty of errors with relevant one saying:

/At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL./

Thanks very much!!!


*Boris *




_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to