can you try to add --mca mtl psm to your mpirun command line ? you might also have to blacklist the opening btl
Cheers, Gilles On Thursday, March 17, 2016, dpchoudh . <dpcho...@gmail.com> wrote: > Hello all > I have a simple test setup, consisting of two Dell workstation nodes with > similar hardware profile. > > Both the nodes have (identical) > 1. Qlogic 4x DDR infiniband > 2. Chelsio C310 iWARP ethernet. > > Both of these cards are connected back to back, without a switch. > > With this setup, I can run OpenMPI over TCP and openib BTL. However, if I > try to use the PSM MTL (excluding the Chelsio NIC, of course, since it does > not support PSM), I get an error from one of the nodes (details below), > which makes me think that a required library or package is not installed, > but I can't figure out what it might be. > > Note that the test program is a simple 'hello world' program. > > The following work: > mpirun -np 2 --hostfile ~/hostfile -mca btl tcp,self ./mpitest > mpirun -np 2 --hostfile ~/hostfile -mca btl self,openib -mca > btl_openib_if_exclude cxgb3_0 ./mpitest > > (I had to exclude the Chelsio card because of this issue: > https://www.open-mpi.org/community/lists/users/2016/03/28661.php ) > > Here is what does NOT work: > mpirun -np 2 --hostfile ~/hostfile -mca mtl psm -mca btl_openib_if_exclude > cxgb3_0 ./mpitest > > The error (from both nodes) is: > mca: base: components_open: component pml / cm open function failed > > However, I still see the "Hello, world" output indicating that the program > ran to completion. > > Here is also another command that does NOT work: > > mpirun -np 2 --hostfile ~/hostfile -mca pml cm -mca btl_openib_if_exclude > cxgb3_0 ./mpitest > > The error is: (from the root node) > PML cm cannot be selected > > However, this time, I see no output from the program, indicating it did > not run. > > The following command also fails in a similar way: > mpirun -np 2 --hostfile ~/hostfile -mca pml cm -mca mtl psm -mca > btl_openib_if_exclude cxgb3_0 ./mpitest > > I have verified that infinipath-psm is installed on both nodes. Both nodes > run identical CentOS 7 and the libraries were installed from the CentOS > repositories (i.e. were not compiled from source) > > Both nodes run OMPI 1.10.2, compiled from the source RPM. > > What am I doing wrong? > > Thanks > Durga > > > > > Life is complex. It has real and imaginary parts. >