You apparently are running on a cluster that uses Torque, yes? If so, it won't use ssh to do the launch - it uses Torque to do it, so the passwordless ssh setup is irrelevant.
Did you ensure that your LD_LIBRARY_PATH includes the OMPI install lib location? On May 3, 2012, at 9:59 AM, Acero Fernandez Alicia wrote: > > > Hello, > > I have a problem when running a mpi program with openmpi library. I did the > following. > > > 1.- I installed the ofed 1.5.4 from RHEL. The hardware are qlogic 7340 ib > cards. > > 2.- I am using openmpi 1.4.3 , the one that comes with ofed 1.5.4 > > 3.- I have check openmpi website, and I have all the requirements they asked: > > ssh passwordless > same ofed/openmpi version in all the cluster nodes > iband conectivity between the nodes, etc > > 4.- When I run an mpi program it runs properly in one node, but it doesn´t > run in more than one node. The error I can see in the execution is the > following: > > dirac13.ciemat.es:06415] plm:tm: failed to poll for a spawned daemon, return > status = 17002 > ------------------------------------------------------------------------ > > -- > > A daemon (pid unknown) died unexpectedly on signal 1 while attempting to > launch so we are aborting. > > > > There may be more information reported by the environment (see above). > > > > This may be because the daemon was unable to find all the needed shared > libraries on the remote node. You may set your LD_LIBRARY_PATH to have the > location of the shared libraries on the remote nodes and this will > automatically be forwarded to the remote nodes. > > ------------------------------------------------------------------------ > > -- > > ------------------------------------------------------------------------ > > -- > > mpiexec noticed that the job aborted, but has no info as to the process that > caused that situation. > > ------------------------------------------------------------------------ > > -- > > ------------------------------------------------------------------------ > > -- > > mpiexec was unable to cleanly terminate the daemons on the nodes shown below. > Additional manual cleanup may be required - please refer to the "orte-clean" > tool for assistance. > > ------------------------------------------------------------------------ > > -- > > dirac12.ciemat.es - daemon did not report back when launched > > > > The command I use to run the mpi program is the following: > > > mpiexec -H dirac12,dirac13 ./cpi > > I have also tried > > mpiexec -np 24 -H dirac12,dirac13 ./cpi > > And sending to the batch > > mpiexec -np 24 -hostfile $PBS_NODEFILE ./cpi > > All of them with the same result. > > > All the mpi libraries in the cluster are the same in all the nodes. > > Please, could anyone help me? > > Thanks, > Alicia > > ---------------------------- > Confidencialidad: > Este mensaje y sus ficheros adjuntos se dirige exclusivamente a su > destinatario y puede contener información privilegiada o confidencial. Si no > es vd. el destinatario indicado, queda notificado de que la utilización, > divulgación y/o copia sin autorización está prohibida en virtud de la > legislación vigente. Si ha recibido este mensaje por error, le rogamos que > nos lo comunique inmediatamente respondiendo al mensaje y proceda a su > destrucción. > > Disclaimer: > This message and its attached files is intended exclusively for its > recipients and may contain confidential information. If you received this > e-mail in error you are hereby notified that any dissemination, copy or > disclosure of this communication is strictly prohibited and may be unlawful. > In this case, please notify us by a reply and delete this email and its > contents immediately. > ---------------------------- > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users