Hello All, I installed Open MPI 1.4.3 on our new HPC blades, with Infiniband interconnection.
My system environments are as: 1)uname -a output: Linux gulftown 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux 2) /home is mounted over all nodes, and mpirun is started under /home/... Open MPI and application codes are compiled with intel(R) compilers V11. Infiniband stack is Mellanox OFED 1.5.2. I have two questions about mpirun: a) how could I get to know what is the network interconnect protocol used by the MPI application? I specify "--mca btl openib,self,sm,tcp" to mpirun, but I want to make sure it really uses infiniband interconnect. b) when I run mpirun, I get the following message: ====== Quote begin bash: orted: command not found bash: orted: command not found bash: orted: command not found -------------------------------------------------------------------------- A daemon (pid 15120) died unexpectedly with status 127 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun was unable to cleanly terminate the daemons on the nodes shown below. Additional manual cleanup may be required - please refer to the "orte-clean" tool for assistance. -------------------------------------------------------------------------- ibnode001 - daemon did not report back when launched ibnode002 - daemon did not report back when launched ibnode003 - daemon did not report back when launched ====== Quote end It seems orted is not found on slave nodes. If I set the PATH and LD_LIBRARY_PATH through --prefix to mpirun, or --path, or -x options to mpirun, to make the orted and related dynamic libs available on slave nodes, it does not work as expected from mpirun manual page. The only working case is that I set PATH and LD_LIBRARY_PATH in ~/.bashrc for mpirun, and this .bashrc is invoked by slave nodes too for login shell. I do not want to set PATH and LD_LIBRARY_PATH in ~/.bashrc, but instead to set options to mpirun directly. Thanks, Yiguang