You apparently are running on a cluster that uses Torque, yes? If so, it won't 
use ssh to do the launch - it uses Torque to do it, so the passwordless ssh 
setup is irrelevant.

Did you ensure that your LD_LIBRARY_PATH includes the OMPI install lib location?


On May 3, 2012, at 9:59 AM, Acero Fernandez Alicia wrote:

> 
> 
> Hello,
> 
> I have a problem when running a mpi program with openmpi library. I did the 
> following.
> 
> 
> 1.- I installed the ofed 1.5.4 from RHEL. The hardware are qlogic 7340 ib 
> cards.
> 
> 2.- I am using openmpi 1.4.3 , the one that comes with ofed 1.5.4
> 
> 3.- I have check openmpi website, and I have all the requirements they asked:
> 
>        ssh passwordless
>        same ofed/openmpi version in all the cluster nodes 
>        iband conectivity between the nodes, etc
> 
> 4.- When I run an mpi program it runs properly in one node, but it doesn´t 
> run in more than one node. The error I can see in the execution is the 
> following:
> 
> dirac13.ciemat.es:06415] plm:tm: failed to poll for a spawned daemon, return 
> status = 17002 
> ------------------------------------------------------------------------
> 
> --
> 
> A daemon (pid unknown) died unexpectedly on signal 1  while attempting to 
> launch so we are aborting.
> 
> 
> 
> There may be more information reported by the environment (see above).
> 
> 
> 
> This may be because the daemon was unable to find all the needed shared 
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the 
> location of the shared libraries on the remote nodes and this will 
> automatically be forwarded to the remote nodes.
> 
> ------------------------------------------------------------------------
> 
> --
> 
> ------------------------------------------------------------------------
> 
> --
> 
> mpiexec noticed that the job aborted, but has no info as to the process that 
> caused that situation.
> 
> ------------------------------------------------------------------------
> 
> --
> 
> ------------------------------------------------------------------------
> 
> --
> 
> mpiexec was unable to cleanly terminate the daemons on the nodes shown below. 
> Additional manual cleanup may be required - please refer to the "orte-clean" 
> tool for assistance.
> 
> ------------------------------------------------------------------------
> 
> --
> 
>        dirac12.ciemat.es - daemon did not report back when launched
> 
> 
> 
> The command I use to run the mpi program is the following:
> 
> 
>        mpiexec -H dirac12,dirac13 ./cpi
> 
> I have also tried
> 
>        mpiexec -np 24 -H dirac12,dirac13 ./cpi
> 
> And sending to the batch
> 
>        mpiexec -np 24 -hostfile $PBS_NODEFILE ./cpi
> 
> All of them with the same result.
> 
> 
> All the mpi libraries in the cluster are the same in all the nodes.
> 
> Please, could anyone help me?
> 
> Thanks,
> Alicia
> 
> ----------------------------
> Confidencialidad: 
> Este mensaje y sus ficheros adjuntos se dirige exclusivamente a su 
> destinatario y puede contener información privilegiada o confidencial. Si no 
> es vd. el destinatario indicado, queda notificado de que la utilización, 
> divulgación y/o copia sin autorización está prohibida en virtud de la 
> legislación vigente. Si ha recibido este mensaje por error, le rogamos que 
> nos lo comunique inmediatamente respondiendo al mensaje y proceda a su 
> destrucción.
> 
> Disclaimer: 
> This message and its attached files is intended exclusively for its 
> recipients and may contain confidential information. If you received this 
> e-mail in error you are hereby notified that any dissemination, copy or 
> disclosure of this communication is strictly prohibited and may be unlawful. 
> In this case, please notify us by a reply and delete this email and its 
> contents immediately. 
> ----------------------------
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to