Hello All,

I installed Open MPI 1.4.3 on our new HPC blades, with Infiniband 
interconnection.

My system environments are as:

1)uname -a output:  
Linux gulftown 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 
2010 x86_64 x86_64 x86_64 GNU/Linux

2) /home is mounted over all nodes, and mpirun is started under 
/home/...

Open MPI and application codes are compiled with intel(R) 
compilers V11. Infiniband stack is Mellanox OFED 1.5.2.

I have two questions about mpirun:

a) how could I get to know what is the network interconnect 
protocol used by the MPI application? 

I specify "--mca btl openib,self,sm,tcp" to mpirun, but I want to 
make sure it really uses infiniband interconnect.

b) when I run mpirun, I get the following message:
====== Quote begin
bash: orted: command not found
bash: orted: command not found
bash: orted: command not found
--------------------------------------------------------------------------
A daemon (pid 15120) died unexpectedly with status 127 while 
attempting
to launch so we are aborting.

There may be more information reported by the environment (see 
above).

This may be because the daemon was unable to find all the 
needed shared
libraries on the remote node. You may set your 
LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the 
process
that caused that situation.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons on the nodes 
shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
--------------------------------------------------------------------------
        ibnode001 - daemon did not report back when launched
        ibnode002 - daemon did not report back when launched
        ibnode003 - daemon did not report back when launched

====== Quote end

It seems orted is not found on slave nodes. If I set the PATH and 
LD_LIBRARY_PATH through --prefix to mpirun, or --path, or -x 
options to mpirun, to make the orted and related dynamic libs 
available on slave nodes, it does not work as expected from mpirun 
manual page. The only working case is that I set PATH and 
LD_LIBRARY_PATH in ~/.bashrc for mpirun, and this .bashrc is 
invoked by slave nodes too for login shell. I do not want to set PATH 
and LD_LIBRARY_PATH in ~/.bashrc, but instead to set options to 
mpirun directly.

Thanks,
Yiguang

Reply via email to