On Aug 15, 2014, at 5:39 PM, Maxime Boissonneault <maxime.boissonnea...@calculquebec.ca> wrote:
> Correct. > > Can it be because torque (pbs_mom) is not running on the head node and > mpiexec attempts to contact it ? Not for Open MPI's mpiexec, no. Open MPI's mpiexec (mpirun -- they're the same to us) will only try to use TM stuff (i.e., Torque stuff) if it sees the environment variable markers indicating that it's inside a Torque job. If not, it just uses rsh/ssh (or localhost launch in your case, since you didn't specify any hosts). If you are unable to run even "mpirun -np 4 hostname" (i.e., the non-MPI "hostname" command from Linux), then something is seriously borked with your Open MPI installation. Try running with: mpirun -np 4 --mca plm_base_verbose 10 hostname This should show the steps OMPI is trying to take to launch the 4 copies of "hostname" and potentially give some insight into where it's hanging. Also, just to make sure, you have ensured that you're compiling everything with a single compiler toolchain, and the support libraries from that specific compiler toolchain are available on any server on which you're running (to include the head node and compute nodes), right? And you've verified that PATH and LD_LIBRARY_PATH are pointing to the right places -- i.e., to the Open MPI installation that you expect it to point to. E.g., if you "ldd ring_c", it shows the libmpi.so that you expect. And "which mpiexec" shows the mpirun that you expect. Etc. Correct? -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/