Hi,

I successfully installed openmpi-1.8.5 on my machines (Solaris 10
Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with
gcc-4.9.2 and Sun C 5.13. I get the same error for both compilers,
if I use the following command and no errors if I change the order
of the first two machines. I also get no errors if I use
openmpi-dev-1708-g8497a6a for an arbitrary order of the machines.


tyr hello_1 109 which mpicc
/usr/local/openmpi-1.8.5_64_cc/bin/mpicc
tyr hello_1 110 mpiexec -np 5 -host sunpc1,linpc1,tyr,rs0 hello_1_mpi
ld.so.1: ssh: fatal: relocation error: file /usr/bin/ssh: symbol 
SUNWcry_installed: referenced symbol not found
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
  settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
  Please verify your allocation and authorities.

* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
  Please check with your sys admin to determine the correct location to use.

*  compilation of the orted with dynamic libraries when static are required
  (e.g., on Cray). Please check your configure cmd line and consider using
  one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
  lack of common network interfaces and/or no route found between
  them. Please check network connectivity (including firewalls
  and network routing requirements).
--------------------------------------------------------------------------



Now the program hangs and "top" shows that "orterun" is very busy.

   PID USERNAME THR PR NCE  SIZE   RES STATE   TIME FLTS    CPU COMMAND
 29550 fd1026     2  0   0 14.5M 8576K cpu01   1:06    0 47.72% orterun




tyr hello_1 116 mpiexec -np 5 -host linpc1,sunpc1,tyr,rs0 hello_1_mpi
Process 2 of 5 running on sunpc1
Process 4 of 5 running on rs0.informatik.hs-fulda.de
Process 3 of 5 running on tyr.informatik.hs-fulda.de
Process 1 of 5 running on linpc1
Process 0 of 5 running on linpc1
...



Everything works fine with openmpi-dev-1708-g8497a6a.

tyr hello_1 120 which mpicc
/usr/local/openmpi-1.9.0_64_gcc/bin/mpicc
tyr hello_1 121 mpiexec -np 5 -host sunpc1,linpc1,tyr,rs0 hello_1_mpi
Process 2 of 5 running on linpc1
Process 0 of 5 running on sunpc1
Process 1 of 5 running on sunpc1
Process 4 of 5 running on rs0.informatik.hs-fulda.de
Process 3 of 5 running on tyr.informatik.hs-fulda.de
...


Any ideas what's going wrong? I would be grateful if somebody can
fix the problem. Thank you very much for any help in advance.


Kind regards

Siegmar

Reply via email to