On Aug 15, 2014, at 5:39 PM, Maxime Boissonneault 
<maxime.boissonnea...@calculquebec.ca> wrote:

> Correct.
> 
> Can it be because torque (pbs_mom) is not running on the head node and 
> mpiexec attempts to contact it ?

Not for Open MPI's mpiexec, no.

Open MPI's mpiexec (mpirun -- they're the same to us) will only try to use TM 
stuff (i.e., Torque stuff) if it sees the environment variable markers 
indicating that it's inside a Torque job.  If not, it just uses rsh/ssh (or 
localhost launch in your case, since you didn't specify any hosts).

If you are unable to run even "mpirun -np 4 hostname" (i.e., the non-MPI 
"hostname" command from Linux), then something is seriously borked with your 
Open MPI installation.

Try running with:

    mpirun -np 4 --mca plm_base_verbose 10 hostname

This should show the steps OMPI is trying to take to launch the 4 copies of 
"hostname" and potentially give some insight into where it's hanging.

Also, just to make sure, you have ensured that you're compiling everything with 
a single compiler toolchain, and the support libraries from that specific 
compiler toolchain are available on any server on which you're running (to 
include the head node and compute nodes), right?

And you've verified that PATH and LD_LIBRARY_PATH are pointing to the right 
places -- i.e., to the Open MPI installation that you expect it to point to.  
E.g., if you "ldd ring_c", it shows the libmpi.so that you expect.  And "which 
mpiexec" shows the mpirun that you expect.  Etc.

Correct?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to