On Monday, March 21, 2011 12:25:37 pm Dave Love wrote:
> I'm trying to test some new nodes with ConnectX adaptors, and failing to
> get (so far just) IMB to run on them.
...
> I'm using gcc-compiled OMPI 1.4.3 and the current RedHat 5 OFED with IMB
> 3.2.2, specifying `btl openib,sm,self' (or `mtl psm' on the Qlogic
> nodes).  I'm not sure what else might be relevant.  The output from
> trying to run IMB follows, for what it's worth.
> 
>  
> --------------------------------------------------------------------------
> At least one pair of MPI processes are unable to reach each other for MPI
> communications.  This means that no Open MPI device has indicated that it
> can be used to communicate between these processes.  This is an error;
> Open MPI requires that all MPI processes be able to reach each other. 
> This error can sometimes be the result of forgetting to specify the "self"
> BTL.
> 
>     Process 1 ([[25307,1],2]) is on host: lvgig116
>     Process 2 ([[25307,1],12]) is on host: lvgig117
>     BTLs attempted: self sm

Are you sure you launched it correctly and that you have (re)built OpenMPI 
against your Redhat-5 ib stack?
 
>   Your MPI job is now going to abort; sorry.
...
>   [lvgig116:07931] 19 more processes have sent help message
> help-mca-bml-r2.txt / unreachable proc [lvgig116:07931] Set MCA parameter

Seems to me that OpenMPI gave up because it didn't succeed in initializing any 
inter-node btl/mtl.

I'd suggest you try (roughly in order):

 1) ibstat on all nodes to verify that your ib interfaces are up
 2) try a verbs level test (like ib_write_bw) to verify data can flow
 3) make sure your OpenMPI was built with the redhat libibverbs-devel present
    (=> a suitable openib btl is built).

/Peter

> "orte_base_help_aggregate" to 0 to see all help / error messages
> [lvgig116:07931] 19 more processes have sent help message help-mpi-runtime
> / mpi_init:startup:internal-failure

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to