Dear all, Myself quite new to Open MPI. Recently, I had installed openmpi-1.3.3 separately on two of my machines ict1 and ict2. These machines are dual-socket quad-core (Intel Xeon E5410) i.e. each having 8 processors and are connected by Gigabit ethernet switch. As a prerequisite, I can ssh between them without a password or passphrase ( I did not supply the passphrase at all ). Thereafter,
$ cd openmpi-1.3.3 $ mkdir build $ cd build $ ../configure --prefix=/usr/local/openmpi-1.3.3/ Then as a root user, # make all install Also .bash_profile and .bashrc had the following lines written into them: PATH=$PATH:/usr/local/openmpi-1.3.3/bin/ LD_LIBRARY_PATH=/usr/local/openmpi-1.3.3/lib/ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- $ cd ../examples/ $ make $ mpirun -np 2 --host ict1 hello_c hello_c: error while loading shared libraries: libmpi.so.0: cannot open shared object file: No suchfile or directory hello_c: error while loading shared libraries: libmpi.so.0: cannot open shared object file: No suchfile or directory $ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 2 --host ict1 hello_c Hello, world, I am 1 of 2 Hello, world, I am 0 of 2 But the program hangs when .... $ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 2 --host ict1,ict2 hello_c This statement does not produce any output. Doing top on either machines does not show any hello_c running. However, when I press Ctrl+C the following output appears ^Cmpirun: killing job... -------------------------------------------------------------------------- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun was unable to cleanly terminate the daemons on the nodes shown below. Additional manual cleanup may be required - please refer to the "orte-clean" tool for assistance. -------------------------------------------------------------------------- ict2 - daemon did not report back when launched $ The same thing repeats itself when hello_c is run from ict2. Since, the program does not produce any error, it becomes difficult to locate where I might have gone wrong. Did anyone of you encounter this problem or anything similar ? Any help would be much appreciated. Thanks, -- Souvik