Ompi failing on mx onlyHi, Gary-
This looks like a config problem, and not a code problem yet.  Could you send 
the output of mx_info from node-1 and from node-2?  Also, forgive me 
counter-asking a possibly dumb OMPI question, but is "-x LD_LIBRARY_PATH" 
really what you want, as opposed to "-x LD_LIBRARY_PATH=${LD_LIBRARY_PATH}" ?  
(I would not be surprised if not specifying a value defaults to this behavior, 
but have to ask).

Also, have you tried MX MTL as opposed to BTL?  --mca pml cm --mca mtl mx,self  
(it looks like you did)

"[node-2:10464] mx_connect fail for node-2:0 with key aaaaffff " makes it look 
like your fabric may not be fully mapped or that you may have a down link.

thanks,
-reese
Myricom, Inc.


  I was initially using 1.1.2 and moved to 1.2b2 because of a hang on 
MPI_Bcast() which 1.2b2 reports to fix, and seemed to have done so. My compute 
nodes are 2 dual core xeons on myrinet with mx. The problem is trying to get 
ompi running on mx only. My machine file is as follows .

  node-1 slots=4 max-slots=4 
  node-2 slots=4 max-slots=4 
  node-3 slots=4 max-slots=4 

  'mpirun' with the minimum number of processes in order to get the error ... 
          mpirun --prefix /usr/local/openmpi-1.2b2 -x LD_LIBRARY_PATH 
--hostfile ./h1-3 -np 2 --mca btl mx,self ./cpi 

  I don't believe there'a anything wrong w/ the hardware as I can ping on mx 
between this failed node and the master fine. So I tried a different set of 3 
nodes and I got the same error, it always fails on the 2nd node of any group of 
nodes I choose.

Reply via email to