Siegmar, can you run LD_LIBRARY_PATH= LD_LIBRARY_PATH64= /usr/bin/ssh on all your boxes ?
the root cause could be you try to run ssh on box A with the env of box B can you also run with the -output-tag (or -tag-output) so we can figure out on which box ssh is failing Cheers, Gilles On Friday, May 15, 2015, Siegmar Gross <siegmar.gr...@informatik.hs-fulda.de> wrote: > Hi, > > I successfully installed openmpi-1.8.5 on my machines (Solaris 10 > Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with > gcc-4.9.2 and Sun C 5.13. I get the same error for both compilers, > if I use the following command and no errors if I change the order > of the first two machines. I also get no errors if I use > openmpi-dev-1708-g8497a6a for an arbitrary order of the machines. > > > tyr hello_1 109 which mpicc > /usr/local/openmpi-1.8.5_64_cc/bin/mpicc > tyr hello_1 110 mpiexec -np 5 -host sunpc1,linpc1,tyr,rs0 hello_1_mpi > ld.so.1: ssh: fatal: relocation error: file /usr/bin/ssh: symbol > SUNWcry_installed: referenced symbol not found > -------------------------------------------------------------------------- > ORTE was unable to reliably start one or more daemons. > This usually is caused by: > > * not finding the required libraries and/or binaries on > one or more nodes. Please check your PATH and LD_LIBRARY_PATH > settings, or configure OMPI with --enable-orterun-prefix-by-default > > * lack of authority to execute on one or more specified nodes. > Please verify your allocation and authorities. > > * the inability to write startup files into /tmp > (--tmpdir/orte_tmpdir_base). > Please check with your sys admin to determine the correct location to > use. > > * compilation of the orted with dynamic libraries when static are required > (e.g., on Cray). Please check your configure cmd line and consider using > one of the contrib/platform definitions for your system type. > > * an inability to create a connection back to mpirun due to a > lack of common network interfaces and/or no route found between > them. Please check network connectivity (including firewalls > and network routing requirements). > -------------------------------------------------------------------------- > > > > Now the program hangs and "top" shows that "orterun" is very busy. > > PID USERNAME THR PR NCE SIZE RES STATE TIME FLTS CPU COMMAND > 29550 fd1026 2 0 0 14.5M 8576K cpu01 1:06 0 47.72% orterun > > > > > tyr hello_1 116 mpiexec -np 5 -host linpc1,sunpc1,tyr,rs0 hello_1_mpi > Process 2 of 5 running on sunpc1 > Process 4 of 5 running on rs0.informatik.hs-fulda.de > Process 3 of 5 running on tyr.informatik.hs-fulda.de > Process 1 of 5 running on linpc1 > Process 0 of 5 running on linpc1 > ... > > > > Everything works fine with openmpi-dev-1708-g8497a6a. > > tyr hello_1 120 which mpicc > /usr/local/openmpi-1.9.0_64_gcc/bin/mpicc > tyr hello_1 121 mpiexec -np 5 -host sunpc1,linpc1,tyr,rs0 hello_1_mpi > Process 2 of 5 running on linpc1 > Process 0 of 5 running on sunpc1 > Process 1 of 5 running on sunpc1 > Process 4 of 5 running on rs0.informatik.hs-fulda.de > Process 3 of 5 running on tyr.informatik.hs-fulda.de > ... > > > Any ideas what's going wrong? I would be grateful if somebody can > fix the problem. Thank you very much for any help in advance. > > > Kind regards > > Siegmar > > _______________________________________________ > users mailing list > us...@open-mpi.org <javascript:;> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/05/26871.php >