Hello,

I have a problem when running a mpi program with openmpi library. I did the 
following.


 1.- I installed the ofed 1.5.4 from RHEL. The hardware are qlogic 7340 ib 
cards.

2.- I am using openmpi 1.4.3 , the one that comes with ofed 1.5.4

3.- I have check openmpi website, and I have all the requirements they asked:

        ssh passwordless
        same ofed/openmpi version in all the cluster nodes 
        iband conectivity between the nodes, etc

4.- When I run an mpi program it runs properly in one node, but it doesn´t run 
in more than one node. The error I can see in the execution is the following:

dirac13.ciemat.es:06415] plm:tm: failed to poll for a spawned daemon, return 
status = 17002 
------------------------------------------------------------------------

--

A daemon (pid unknown) died unexpectedly on signal 1  while attempting to 
launch so we are aborting.



There may be more information reported by the environment (see above).



This may be because the daemon was unable to find all the needed shared 
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the 
location of the shared libraries on the remote nodes and this will 
automatically be forwarded to the remote nodes.

------------------------------------------------------------------------

--

------------------------------------------------------------------------

--

mpiexec noticed that the job aborted, but has no info as to the process that 
caused that situation.

------------------------------------------------------------------------

--

------------------------------------------------------------------------

--

mpiexec was unable to cleanly terminate the daemons on the nodes shown below. 
Additional manual cleanup may be required - please refer to the "orte-clean" 
tool for assistance.

------------------------------------------------------------------------

--

        dirac12.ciemat.es - daemon did not report back when launched



The command I use to run the mpi program is the following:


        mpiexec -H dirac12,dirac13 ./cpi

I have also tried

        mpiexec -np 24 -H dirac12,dirac13 ./cpi

And sending to the batch

        mpiexec -np 24 -hostfile $PBS_NODEFILE ./cpi

All of them with the same result.


All the mpi libraries in the cluster are the same in all the nodes.

Please, could anyone help me?

Thanks,
Alicia

----------------------------
Confidencialidad: 
Este mensaje y sus ficheros adjuntos se dirige exclusivamente a su destinatario 
y puede contener información privilegiada o confidencial. Si no es vd. el 
destinatario indicado, queda notificado de que la utilización, divulgación y/o 
copia sin autorización está prohibida en virtud de la legislación vigente. Si 
ha recibido este mensaje por error, le rogamos que nos lo comunique 
inmediatamente respondiendo al mensaje y proceda a su destrucción.

Disclaimer: 
This message and its attached files is intended exclusively for its recipients 
and may contain confidential information. If you received this e-mail in error 
you are hereby notified that any dissemination, copy or disclosure of this 
communication is strictly prohibited and may be unlawful. In this case, please 
notify us by a reply and delete this email and its contents immediately. 
----------------------------

Reply via email to