Thank you everybody. With your help I was able to resolve the issue. For
the sake of completeness, this is what I had to do:

infinipath-psm was already installed in my system when OpenMPI was built
from source. However, infinipath-psm-devel was NOT installed. I suppose
that's why openMPI could not add support for PSM when built from source,
and, following Jeff's advice, I ran

ompi_info | grep psm

which showed no output.

I had to install infinipath-psm-devel and rebuild OpenMPI. That fixed it.

Durga

Life is complex. It has real and imaginary parts.

On Thu, Mar 17, 2016 at 9:17 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com
> wrote:

> Additionally, if you run
>
>   ompi_info | grep psm
>
> Do you see the PSM MTL listed?
>
> To force the CM MTL, you can run:
>
>   mpirun --mca pml cm ...
>
> That won't let any BTLs be selected (because only ob1 uses the BTLs).
>
>
> > On Mar 17, 2016, at 8:07 AM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com> wrote:
> >
> > can you try to add
> > --mca mtl psm
> > to your mpirun command line ?
> >
> > you might also have to blacklist the opening btl
> >
> > Cheers,
> >
> > Gilles
> >
> > On Thursday, March 17, 2016, dpchoudh . <dpcho...@gmail.com> wrote:
> > Hello all
> > I have a simple test setup, consisting of two Dell workstation nodes
> with similar hardware profile.
> >
> > Both the nodes have (identical)
> > 1. Qlogic 4x DDR infiniband
> > 2. Chelsio C310 iWARP ethernet.
> >
> > Both of these cards are connected back to back, without a switch.
> >
> > With this setup, I can run OpenMPI over TCP and openib BTL. However, if
> I try to use the PSM MTL (excluding the Chelsio NIC, of course, since it
> does not support PSM), I get an error from one of the nodes (details
> below), which makes me think that a required library or package is not
> installed, but I can't figure out what it might be.
> >
> > Note that the test program is a simple 'hello world' program.
> >
> > The following work:
> >   mpirun -np 2 --hostfile ~/hostfile -mca btl tcp,self ./mpitest
> > mpirun -np 2 --hostfile ~/hostfile -mca btl self,openib -mca
> btl_openib_if_exclude cxgb3_0 ./mpitest
> >
> > (I had to exclude the Chelsio card because of this issue:
> > https://www.open-mpi.org/community/lists/users/2016/03/28661.php  )
> >
> > Here is what does NOT work:
> > mpirun -np 2 --hostfile ~/hostfile -mca mtl psm -mca
> btl_openib_if_exclude cxgb3_0 ./mpitest
> >
> > The error (from both nodes) is:
> >  mca: base: components_open: component pml / cm open function failed
> >
> > However, I still see the "Hello, world" output indicating that the
> program ran to completion.
> >
> > Here is also another command that does NOT work:
> >
> > mpirun -np 2 --hostfile ~/hostfile -mca pml cm -mca
> btl_openib_if_exclude cxgb3_0 ./mpitest
> >
> > The error is: (from the root node)
> > PML cm cannot be selected
> >
> > However, this time, I see no output from the program, indicating it did
> not run.
> >
> > The following command also fails in a similar way:
> >  mpirun -np 2 --hostfile ~/hostfile -mca pml cm -mca mtl psm -mca
> btl_openib_if_exclude cxgb3_0 ./mpitest
> >
> > I have verified that infinipath-psm is installed on both nodes. Both
> nodes run identical CentOS 7 and the libraries were installed from the
> CentOS repositories (i.e. were not compiled from source)
> >
> > Both nodes run OMPI 1.10.2, compiled from the source RPM.
> >
> > What am I doing wrong?
> >
> > Thanks
> > Durga
> >
> >
> >
> >
> > Life is complex. It has real and imaginary parts.
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/03/28725.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/03/28727.php
>

Reply via email to