Hi Gilles, > can you run > LD_LIBRARY_PATH= LD_LIBRARY_PATH64= /usr/bin/ssh > on all your boxes ? > > the root cause could be you try to run ssh on box A with the env of box B
No, should be the same on all boxes (it worked before and it still works with openmpi-1.9 in the same environment). We don't use /usr/bin/ssh because it doesn't work . tyr hello_1 279 /usr/bin/ssh sunpc1 date ld.so.1: ssh: fatal: relocation error: file /usr/bin/ssh: symbol SUNWcry_installed: referenced symbol not found Killed tyr hello_1 280 /usr/bin/ssh linpc1 date ld.so.1: ssh: fatal: relocation error: file /usr/bin/ssh: symbol SUNWcry_installed: referenced symbol not found Killed tyr hello_1 281 /usr/bin/ssh rs0 date ld.so.1: ssh: fatal: relocation error: file /usr/bin/ssh: symbol SUNWcry_installed: referenced symbol not found Killed tyr hello_1 282 /usr/bin/ssh tyr date ld.so.1: ssh: fatal: relocation error: file /usr/bin/ssh: symbol SUNWcry_installed: referenced symbol not found Killed tyr hello_1 283 We use /usr/local/bin/ssh. tyr hello_1 284 ssh tyr where ssh ssh is aliased to /usr/local/bin/ssh -q -F /usr/local/etc/ssh/ssh_config /usr/local/bin/ssh /usr/bin/ssh tyr hello_1 285 ssh sunpc1 where ssh ssh is aliased to /usr/local/bin/ssh -q -F /usr/local/etc/ssh/ssh_config /usr/local/bin/ssh /usr/bin/ssh tyr hello_1 286 ssh linpc1 where ssh ssh ist ein Alias f\303\274r /usr/local/bin/ssh -q -F /usr/local/etc/ssh/ssh_config /usr/local/bin/ssh /usr/bin/ssh tyr hello_1 287 ssh rs0 where ssh ssh is aliased to /usr/local/bin/ssh -q -F /usr/local/etc/ssh/ssh_config /usr/local/bin/ssh /usr/bin/ssh tyr hello_1 288 > can you also run with the -output-tag (or -tag-output) so we can figure out > on which box ssh is failing tyr hello_1 114 mpiexec -np 5 --host sunpc1,linpc1,tyr,rs0 -output-tag hello_1_mpi mpiexec: Error: unknown option "-output-tag" Type 'mpiexec --help' for usage. tyr hello_1 115 mpiexec -np 5 --host sunpc1,linpc1,tyr,rs0 -tag-output hello_1_mpi ld.so.1: ssh: fatal: relocation error: file /usr/bin/ssh: symbol SUNWcry_installed: referenced symbol not found -------------------------------------------------------------------------- ORTE was unable to reliably start one or more daemons. This usually is caused by: ... The output is still the same as before and the process blocks as before so that the new flag didn't change anything. Sorry, for the bad news. I have another small program that prints the environment. Sometimes it breaks with the same error as above and somtimes it works as expected. The following commands break. tyr hello_1 259 mpiexec -np 7 --host sunpc0,sunpc1,linpc0,linpc1,tyr,rs0 environ_mpi tyr hello_1 264 mpiexec -np 5 --host sunpc1,linpc1,tyr,rs0 environ_mpi tyr hello_1 269 mpiexec -np 5 --host tyr,rs0,sunpc1,linpc1 environ_mpi | more tyr hello_1 263 mpiexec -np 4 --host sunpc1,linpc1,rs0 environ_mpi The following commands work fine. tyr hello_1 249 mpiexec -np 7 --host linpc0,linpc1,sunpc0,sunpc1,tyr,rs0 environ_mpi tyr hello_1 261 mpiexec -np 4 --host sunpc1,linpc1,tyr environ_mpi tyr hello_1 266 mpiexec -np 4 --host sunpc1,tyr,rs0 environ_mpi Some variations of the last command that breaks (see avbove) which also work fine. tyr hello_1 272 mpiexec -np 3 --host sunpc1,rs0 environ_mpi tyr hello_1 273 mpiexec -np 3 --host linpc1,rs0 environ_mpi tyr hello_1 277 mpiexec -np 3 --host sunpc1,linpc1 environ_mpi Open MPI sees the following environment on Solaris 10 x86_64, Linux x86_64, and Solaris 10 Sparc. tyr hello_1 122 mpiexec -np 4 --host sunpc1,linpc1,tyr environ_mpi Now 3 slave tasks are sending their environment. Environment from task 1: message type: 3 msg length: 3627 characters message: hostname: sunpc1 operating system: SunOS release: 5.10 processor: i86pc PATH bin /usr/local/openmpi-1.8.5_64_cc/bin /usr/local/NetBeans-4.0/bin /usr/local/jdk1.8.0/bin /usr/local/apache-ant-1.6.2/bin /usr/local/db-derby-10.11.1.1-bin/bin /usr/local/gcc-4.9.2/bin /opt/solstudio12.4/bin /usr/local/bin /usr/local/ssl/bin /usr/local/pgsql/bin /usr/bin /usr/openwin/bin /usr/dt/bin /usr/ccs/bin /usr/sfw/bin /opt/sfw/bin /usr/ucb /usr/lib/lp/postscript /usr/local/teTeX-1.0.7/bin/i386-pc-solaris2.10 /usr/local/bluej-2.1.2 /usr/local/hwloc-1.10.0/bin /home/fd1026/SunOS/x86_64/bin . /usr/sbin LD_LIBRARY_PATH_64 /usr/local/openmpi-1.8.5_64_cc/lib64 /usr/local/jdk1.8.0/jre/lib/amd64 /usr/local/gcc-4.9.2/lib/amd64 /usr/local/gcc-4.9.2/lib/gcc/i386-pc-solaris2.10/4.9.2/amd64 /usr/local/lib/amd64 /usr/local/ssl/lib/amd64 /usr/local/lib64 /usr/lib/amd64 /usr/openwin/lib/amd64 /usr/openwin/server/lib/amd64 /usr/dt/lib/amd64 /usr/X11R6/lib/amd64 /usr/ccs/lib/amd64 /usr/sfw/lib/amd64 /opt/sfw/lib/amd64 /usr/ucblib/amd64 /usr/local/hwloc-1.10.0/lib64 /home/fd1026/SunOS/x86_64/lib64 LD_LIBRARY_PATH lib64 /usr/local/openmpi-1.8.5_64_cc/lib /usr/local/jdk1.8.0/jre/lib/i386 /usr/local/gcc-4.9.2/lib /usr/local/gcc-4.9.2/lib/gcc/i386-pc-solaris2.10/4.9.2 /usr/local/lib /usr/local/ssl/lib /usr/local/oracle /usr/local/pgsql/lib /usr/lib /usr/openwin/lib /usr/openwin/server/lib /usr/dt/lib /usr/X11R6/lib /usr/ccs/lib /usr/sfw/lib /opt/sfw/lib /usr/ucblib /usr/local/hwloc-1.10.0/lib /usr/lib/gnome-private/lib /home/fd1026/SunOS/x86_64/lib CLASSPATH /usr/local/db-derby-10.11.1.1-bin/lib/derby.jar /usr/local/db-derby-10.11.1.1-bin/lib/derbytools.jar /usr/local/db-derby-10.11.1.1-bin/lib/derbyrun.jar /usr/local/jdk1.8.0/hibernate-jpa-2.0-api-1.0.0.Final.jar /usr/local/junit4.10 /usr/local/junit4.10/junit-4.10.jar /usr/local/javacc-5.0/javacc.jar . /home/fd1026/SunOS/x86_64/mpi_classfiles Environment from task 2: message type: 3 msg length: 6760 characters message: hostname: linpc1 operating system: Linux release: 3.1.10-1.29-desktop processor: x86_64 PATH bin /usr/local/openmpi-1.8.5_64_cc/bin /usr/local/NetBeans-4.0/bin /usr/local/jdk1.8.0/bin /usr/local/apache-ant-1.6.2/bin /usr/local/db-derby-10.11.1.1-bin/bin /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/bin/intel64 /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/mpirt/bin/intel64 /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/debugger/gdb/intel64_mic/py27/bin /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/debugger/gdb/intel64/py27/bin /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/bin/intel64 /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/bin/intel64_mic /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/debugger/gui/intel64 /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/bin/ia32 /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/mpirt/bin/ia32 /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/debugger/gdb/intel64_mic/py27/bin /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/debugger/gdb/intel64/py27/bin /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/bin/ia32 /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/debugger/gui/ia32 /usr/local/gcc-4.9.2/bin /opt/solstudio12.4/bin /usr/local/bin /usr/local/ssl/bin /usr/local/pgsql/bin /bin /usr/bin /usr/X11R6/bin /usr/local/teTeX-1.0.7/bin/i586-pc-linux-gnu /usr/local/bluej-2.1.2 /usr/local/hwloc-1.10.0/bin /home/fd1026/Linux/x86_64/bin . /usr/sbin LD_LIBRARY_PATH_64 /usr/local/openmpi-1.8.5_64_cc/lib64 /usr/local/jdk1.8.0/jre/lib/amd64 /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/compiler/lib/intel64 /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/mpirt/lib/intel64 /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/ipp/../compiler/lib/intel64 /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/ipp/lib/intel64 /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/compiler/lib/intel64 /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/mkl/lib/intel64 /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/tbb/lib/intel64/gcc4.4 /usr/local/gcc-4.9.2/lib64 /usr/local/gcc-4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2 /usr/local/gcc-4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 /usr/local/lib64 /usr/local/ssl/lib64 /usr/lib64 /usr/X11R6/lib64 /usr/local/hwloc-1.10.0/lib64 /home/fd1026/Linux/x86_64/lib64 LD_LIBRARY_PATH lib64 /usr/local/openmpi-1.8.5_64_cc/lib /usr/local/jdk1.8.0/jre/lib/i386 /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/compiler/lib/ia32 /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/mpirt/lib/ia32 /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/ipp/../compiler/lib/ia32 /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/ipp/lib/ia32 /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/compiler/lib/ia32 /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/mkl/lib/ia32 /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/tbb/lib/ia32/gcc4.4 /usr/local/gcc-4.9.2/lib /usr/local/gcc-4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/32 /usr/local/gcc-4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2/32 /usr/local/lib /usr/local/ssl/lib /lib /usr/lib /usr/X11R6/lib /usr/local/hwloc-1.10.0/lib /usr/lib/gnome-private/lib /home/fd1026/Linux/x86_64/lib /usr/local/openmpi-1.8.5_64_cc/lib64 /usr/local/jdk1.8.0/jre/lib/amd64 /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/compiler/lib/intel64 /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/mpirt/lib/intel64 /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/ipp/../compiler/lib/intel64 /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/ipp/lib/intel64 /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/compiler/lib/intel64 /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/mkl/lib/intel64 /usr/local/intel_xe_2013/composer_xe_2013_sp1.1.106/tbb/lib/intel64/gcc4.4 /usr/local/gcc-4.9.2/lib64 /usr/local/gcc-4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2 /usr/local/gcc-4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 /usr/local/lib64 /usr/local/ssl/lib64 /usr/lib64 /usr/X11R6/lib64 /usr/local/hwloc-1.10.0/lib64 /home/fd1026/Linux/x86_64/lib64 CLASSPATH /usr/local/db-derby-10.11.1.1-bin/lib/derby.jar /usr/local/db-derby-10.11.1.1-bin/lib/derbytools.jar /usr/local/db-derby-10.11.1.1-bin/lib/derbyrun.jar /usr/local/jdk1.8.0/hibernate-jpa-2.0-api-1.0.0.Final.jar /usr/local/junit4.10 /usr/local/junit4.10/junit-4.10.jar /usr/local/javacc-5.0/javacc.jar . /home/fd1026/Linux/x86_64/mpi_classfiles Environment from task 3: message type: 3 msg length: 3676 characters message: hostname: tyr.informatik.hs-fulda.de operating system: SunOS release: 5.10 processor: sun4u PATH /usr/local/openmpi-1.8.5_64_cc/bin /usr/local/NetBeans-4.0/bin /usr/local/jdk1.8.0/bin /usr/local/apache-ant-1.6.2/bin /usr/local/db-derby-10.11.1.1-bin/bin /usr/local/gcc-4.9.2/bin /opt/solstudio12.4/bin /usr/local/bin /usr/local/ssl/bin /usr/local/pgsql/bin /usr/bin /usr/openwin/bin /usr/dt/bin /usr/ccs/bin /usr/sfw/bin /opt/sfw/bin /usr/ucb /usr/xpg4/bin /usr/local/teTeX-1.0.7/bin/sparc-sun-solaris2.10 /usr/local/bluej-2.1.2 /usr/local/hwloc-1.10.0/bin /home/fd1026/SunOS/sparc/bin . /usr/sbin LD_LIBRARY_PATH_64 /usr/local/openmpi-1.8.5_64_cc/lib64 /usr/local/jdk1.8.0/jre/lib/sparcv9 /usr/local/gcc-4.9.2/lib/sparcv9 /usr/local/gcc-4.9.2/lib/gcc/sparc-sun-solaris2.10/4.9.2/sparcv9 /usr/local/lib/sparcv9 /usr/local/ssl/lib/sparcv9 /usr/local/lib64 /usr/local/oracle/sparcv9 /usr/local/pgsql/lib/sparcv9 /lib/sparcv9 /usr/lib/sparcv9 /usr/openwin/lib/sparcv9 /usr/dt/lib/sparcv9 /usr/X11R6/lib/sparcv9 /usr/ccs/lib/sparcv9 /usr/sfw/lib/sparcv9 /opt/sfw/lib/sparcv9 /usr/ucblib/sparcv9 /usr/local/hwloc-1.10.0/lib64 /home/fd1026/SunOS/sparc/lib64 LD_LIBRARY_PATH /usr/local/openmpi-1.8.5_64_cc/lib /usr/local/jdk1.8.0/jre/lib/sparc /usr/local/gcc-4.9.2/lib /usr/local/gcc-4.9.2/lib/gcc/sparc-sun-solaris2.10/4.9.2 /usr/local/lib /usr/local/ssl/lib /usr/local/oracle /usr/local/pgsql/lib /lib /usr/lib /usr/openwin/lib /usr/dt/lib /usr/X11R6/lib /usr/ccs/lib /usr/sfw/lib /opt/sfw/lib /usr/ucblib /usr/local/hwloc-1.10.0/lib /usr/lib/gnome-private/lib /home/fd1026/SunOS/sparc/lib CLASSPATH /usr/local/db-derby-10.11.1.1-bin/lib/derby.jar /usr/local/db-derby-10.11.1.1-bin/lib/derbytools.jar /usr/local/db-derby-10.11.1.1-bin/lib/derbyrun.jar /usr/local/jdk1.8.0/hibernate-jpa-2.0-api-1.0.0.Final.jar /usr/local/junit4.10 /usr/local/junit4.10/junit-4.10.jar /usr/local/javacc-5.0/javacc.jar . /home/fd1026/SunOS/sparc/mpi_classfiles tyr hello_1 239 Do you see anything strange? I don't know why I have "bin" at the beginning of PATH and "lib64" at the beginning of LD_LIBRARY_PATH for "sunpc1" and "linpc1" because they are not available in the environment variables. I have to investigate it. tyr hello_1 291 ssh linpc1 echo $PATH /usr/local/openmpi-1.8.5_64_cc/bin:/usr/local/NetBeans-4.0/bin:/usr/local/jdk1.8.0/bin:/usr/local/apache-ant-1.6.2/bin:/usr /local/db-derby-10.11.1.1-bin/bin:/usr/local/gcc-4.9.2/bin:/opt/solstudio12.4/bin:/usr/local/bin:/usr/local/ssl/bin:/usr/lo cal/pgsql/bin:/usr/bin:/usr/openwin/bin:/usr/dt/bin:/usr/ccs/bin:/usr/sfw/bin:/opt/sfw/bin:/usr/ucb:/usr/xpg4/bin:/usr/loca l/teTeX-1.0.7/bin/sparc-sun-solaris2.10:/usr/local/bluej-2.1.2:/usr/local/hwloc-1.10.0/bin:/home/fd1026/SunOS/sparc/bin:.:/ usr/sbin tyr hello_1 292 ssh linpc1 echo $LD_LIBRARY_PATH /usr/local/openmpi-1.8.5_64_cc/lib:/usr/local/jdk1.8.0/jre/lib/sparc:/usr/local/gcc-4.9.2/lib:/usr/local/gcc-4.9.2/lib/gcc/ sparc-sun-solaris2.10/4.9.2:/usr/local/lib:/usr/local/ssl/lib:/usr/local/oracle:/usr/local/pgsql/lib:/lib:/usr/lib:/usr/ope nwin/lib:/usr/dt/lib:/usr/X11R6/lib:/usr/ccs/lib:/usr/sfw/lib:/opt/sfw/lib:/usr/ucblib:/usr/local/hwloc-1.10.0/lib:/usr/lib /gnome-private/lib:/home/fd1026/SunOS/sparc/lib tyr hello_1 293 Kind regards and thank you very much for your help. Siegmar > On Friday, May 15, 2015, Siegmar Gross <siegmar.gr...@informatik.hs-fulda.de> > wrote: > > > Hi, > > > > I successfully installed openmpi-1.8.5 on my machines (Solaris 10 > > Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with > > gcc-4.9.2 and Sun C 5.13. I get the same error for both compilers, > > if I use the following command and no errors if I change the order > > of the first two machines. I also get no errors if I use > > openmpi-dev-1708-g8497a6a for an arbitrary order of the machines. > > > > > > tyr hello_1 109 which mpicc > > /usr/local/openmpi-1.8.5_64_cc/bin/mpicc > > tyr hello_1 110 mpiexec -np 5 -host sunpc1,linpc1,tyr,rs0 hello_1_mpi > > ld.so.1: ssh: fatal: relocation error: file /usr/bin/ssh: symbol > > SUNWcry_installed: referenced symbol not found > > -------------------------------------------------------------------------- > > ORTE was unable to reliably start one or more daemons. > > This usually is caused by: > > > > * not finding the required libraries and/or binaries on > > one or more nodes. Please check your PATH and LD_LIBRARY_PATH > > settings, or configure OMPI with --enable-orterun-prefix-by-default > > > > * lack of authority to execute on one or more specified nodes. > > Please verify your allocation and authorities. > > > > * the inability to write startup files into /tmp > > (--tmpdir/orte_tmpdir_base). > > Please check with your sys admin to determine the correct location to > > use. > > > > * compilation of the orted with dynamic libraries when static are required > > (e.g., on Cray). Please check your configure cmd line and consider using > > one of the contrib/platform definitions for your system type. > > > > * an inability to create a connection back to mpirun due to a > > lack of common network interfaces and/or no route found between > > them. Please check network connectivity (including firewalls > > and network routing requirements). > > -------------------------------------------------------------------------- > > > > > > > > Now the program hangs and "top" shows that "orterun" is very busy. > > > > PID USERNAME THR PR NCE SIZE RES STATE TIME FLTS CPU COMMAND > > 29550 fd1026 2 0 0 14.5M 8576K cpu01 1:06 0 47.72% orterun > > > > > > > > > > tyr hello_1 116 mpiexec -np 5 -host linpc1,sunpc1,tyr,rs0 hello_1_mpi > > Process 2 of 5 running on sunpc1 > > Process 4 of 5 running on rs0.informatik.hs-fulda.de > > Process 3 of 5 running on tyr.informatik.hs-fulda.de > > Process 1 of 5 running on linpc1 > > Process 0 of 5 running on linpc1 > > ... > > > > > > > > Everything works fine with openmpi-dev-1708-g8497a6a. > > > > tyr hello_1 120 which mpicc > > /usr/local/openmpi-1.9.0_64_gcc/bin/mpicc > > tyr hello_1 121 mpiexec -np 5 -host sunpc1,linpc1,tyr,rs0 hello_1_mpi > > Process 2 of 5 running on linpc1 > > Process 0 of 5 running on sunpc1 > > Process 1 of 5 running on sunpc1 > > Process 4 of 5 running on rs0.informatik.hs-fulda.de > > Process 3 of 5 running on tyr.informatik.hs-fulda.de > > ... > > > > > > Any ideas what's going wrong? I would be grateful if somebody can > > fix the problem. Thank you very much for any help in advance. > > > > > > Kind regards > > > > Siegmar > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org <javascript:;> > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > > http://www.open-mpi.org/community/lists/users/2015/05/26871.php > >