Hello,

I am trying to run a program on a cluster composed with Apple Xserve running 
10.5.8 (Leopard).


1) I am using openmpi-1.4.4 compiled with Intel ifort and icc (V12)
(/opt is a share point mounted in /Network/opt with NFS)

./configure --prefix=/opt/openmpi-1.4.4                             \
F77=/Network/opt/intel/composerxe/bin/ifort F77FLAGS="-arch x86_64" \
FC=/Network/opt/intel/composerxe/bin/ifort  FCFLAGS="-arch x86_64"  \
CC=/Network/opt/intel/composerxe/bin/icc    CFLAGS="-arch x86_64"   \
CXX=/Network/opt/intel/composerxe/bin/icpc  CXXFLAGS="-arch x86_64"

make
sudo make install


Each /etc/profile of my nodes contains :

export COMP_HOME=/Network/opt/intel/composerxe
export PATH=$COMP_HOME/bin:$COMP_HOME/man:$PATH
export DYLD_LIBRARY_PATH=$COMP_HOME/lib/:$DYLD_LIBRARY_PATH

export MPI_HOME=/Network/opt/openmpi-1.4.4
export OPAL_PREFIX=/Network/opt/openmpi-1.4.4

export PATH=${MPI_HOME}/bin:${MPI_HOME}/man:$PATH
export DYLD_LIBRARY_PATH=$MPI_HOME/lib/:$DYLD_LIBRARY_PATH
export LD_LIBRARY_PATH=$MPI_HOME/lib/:$LD_LIBRARY_PATH

2) when I lauch mpirun on several nodes, the MPI connections fails and I have 
the error message :

 mpirun --prefix /Network/opt/openmpi-1.4.4/ -H node1,node2 -n 2 space64 -f 
Test/Euler/eulerRigid.def
dyld: lazy symbol binding failed: Symbol not found: _orte_daemon
  Referenced from: /Network/opt/openmpi-1.4.4/bin/orted
  Expected in: /usr/lib/libopen-rte.0.dylib

dyld: Symbol not found: _orte_daemon
  Referenced from: /Network/opt/openmpi-1.4.4/bin/orted
  Expected in: /usr/lib/libopen-rte.0.dylib

bash: line 1:  2973 Trace/BPT trap          
/Network/opt/openmpi-1.4.4/bin/orted --daemonize -mca ess env -mca 
orte_ess_jobid 1644560384 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2 
--hnp-uri "1644560384.0;tcp://10.0.0.1:50782;tcp://125.1.4.55:50782"
--------------------------------------------------------------------------
A daemon (pid 41667) died unexpectedly with status 133 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
mpirun: clean termination accomplished


3) Does anyone have an idea ?


--
Christophe Peyret
ONERA - DSNA - PS3A
29 ave de la Division Leclerc
F92320 Chatillon
Tel. : +331 4673 4778
Fax : +331 4673 4166

http://www.onera.fr/dsna/couplage-methodes-aeroacoustiques





Reply via email to