The difficulty here is that you have bundled several errors again into a single message, making it hard to keep the conversation from getting terribly confused. I was trying to address the segfault errors on cleanup, which have nothing to do with the accept being rejected.
It looks like those are being caused by MCA params that are not properly registered, so I'll keep poking for those - I fixed one earlier, but don't know where some of these others are originating. As for the accept error: it looks like your system is rejecting TCP connect requests for some reason, and so we are erroring out. You might check for firewalls, or see if there is something odd about the networking configuration. Setting -mca oob_base_verbose 20 might help generate some useful info. On Sep 2, 2014, at 12:43 PM, Siegmar Gross <siegmar.gr...@informatik.hs-fulda.de> wrote: > Hi Ralph, > >> I don't see any line numbers on the errors I flagged - all I >> see are the usual memory offsets in bytes, which is of little >> help. I'm afraid I don't what what you'd have to do under SunOS >> to get line numbers, but I can't do much without it > > I used "truss" to follow function calls. The error message is in > line 2566 of the attached file. Is the output helpful? Which > commands would you use in gdb for Linux to track down the error, > if you would try it on your machine? > > > Kind regards > > Siegmar > > > > >> On Sep 2, 2014, at 10:26 AM, Siegmar Gross > <siegmar.gr...@informatik.hs-fulda.de> wrote: >> >>> Hi Ralph, >>> >>>> Could you please configure this OMPI install with --enable-debug >>>> so that gdb will provide line numbers where the error is occurring? >>>> Otherwise, I'm having a hard time chasing this problem down. >>> >>> I always configure with "--enable-debug" and I used the following >>> command. I my original email I have had a backtrace with line >>> numbers for both my C and Java problems. >>> >>> tyr openmpi-1.9a1r32657-SunOS.sparc.64_cc 119 head config.log >>> This file contains any messages produced by compilers while >>> running configure, to aid debugging if configure makes a mistake. >>> >>> It was created by Open MPI configure 1.9a1, which was >>> generated by GNU Autoconf 2.69. Invocation command line was >>> >>> $ ../openmpi-1.9a1r32657/configure --prefix=/usr/local/openmpi-1.9_64_cc >>> --libdir=/usr/local/openmpi-1.9_64_cc/lib64 >>> --with-jdk-bindir=/usr/local/jdk1.8.0/bin >>> --with-jdk-headers=/usr/local/jdk1.8.0/include >>> JAVA_HOME=/usr/local/jdk1.8.0 >>> LDFLAGS=-m64 CC=cc CXX=CC FC=f95 CFLAGS=-m64 CXXFLAGS=-m64 >>> -library=stlport4 >>> FCFLAGS=-m64 CPP=cpp CXXCPP=cpp CPPFLAGS= CXXCPPFLAGS= --enable-mpi-cxx >>> --enable-cxx-exceptions --enable-mpi-java --enable-heterogeneous >>> --enable-mpi-thread-multiple --with-threads=posix --with-hwloc=internal >>> --without-verbs --with-wrapper-cflags=-m64 --enable-debug >>> >>> >>> What can I do to provide line numbers for the "mca_oob_tcp_accept: >>> accept() failed" error? >>> >>> Kind regards >>> >>> Siegmar >>> >>> >>>> On Sep 2, 2014, at 6:01 AM, Siegmar Gross >>> <siegmar.gr...@informatik.hs-fulda.de> wrote: >>>> >>>>> C problem: >>>>> ========== >>>>> >>>>> tyr small_prog 111 mpiexec -np 1 --host linpc0 init_finalize >>>>> [tyr.informatik.hs-fulda.de:00593] mca_oob_tcp_accept: accept() failed: >>> Error 0 (11). >>>>> Hello! >>>>> >>>>> tyr small_prog 112 mpiexec -np 1 --host sunpc0 init_finalize >>>>> [tyr.informatik.hs-fulda.de:00597] mca_oob_tcp_accept: accept() failed: >>> Error 0 (11). >>>>> Hello! >>>>> >>>>> tyr small_prog 113 mpiexec -np 1 --host tyr init_finalize >>>>> [tyr:00606] *** Process received signal *** >>>>> [tyr:00606] Signal: Bus Error (10) >>>>> [tyr:00606] Signal code: Invalid address alignment (1) >>>>> [tyr:00606] Failing at address: ffffffff7fffd7fc >>>>> >>> > /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:opal_back >>> trace_print+0x1c >>>>> >>> > /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:0x1a4960 >>>>> /lib/sparcv9/libc.so.1:0xd8b98 >>>>> /lib/sparcv9/libc.so.1:0xcc70c >>>>> /lib/sparcv9/libc.so.1:0xcc918 >>>>> >>> > /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:opal_dss_ >>> unpack_int64+0xf4 [ Signal >>>>> 2096416616 (?)] >>>>> >>> > /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:opal_dss_ >>> unpack_buffer+0x168 >>>>> >>> > /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:opal_dss_ >>> unpack+0x24c >>>>> >>> > /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/openmpi/mca_pmix_native.so:0x1 >>> 4e10 >>>>> >>> > /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libmpi.so.0.0.0:ompi_mpi_init+ >>> 0xd18 >>>>> >>> > /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libmpi.so.0.0.0:MPI_Init+0x26c >>>>> /home/fd1026/SunOS/sparc/bin/init_finalize:main+0x10 >>>>> /home/fd1026/SunOS/sparc/bin/init_finalize:_start+0x12c >>>>> [tyr:00606] *** End of error message *** >>>>> -------------------------------------------------------------------------- >>>>> mpiexec noticed that process rank 0 with PID 606 on node tyr exited on >>> signal 10 (Bus Error). >>>>> -------------------------------------------------------------------------- >>>>> tyr small_prog 114 >>>>> >>>>> >>>>> >>>>> gdb shows the following backtrace. >>>>> >>>>> tyr small_prog 115 /usr/local/gdb-7.6.1_64_gcc/bin/gdb >>> /usr/local/openmpi-1.9_64_cc/bin/mpiexec >>>>> GNU gdb (GDB) 7.6.1 >>>>> Copyright (C) 2013 Free Software Foundation, Inc. >>>>> License GPLv3+: GNU GPL version 3 or later >>> <http://gnu.org/licenses/gpl.html> >>>>> This is free software: you are free to change and redistribute it. >>>>> There is NO WARRANTY, to the extent permitted by law. Type "show copying" >>>>> and "show warranty" for details. >>>>> This GDB was configured as "sparc-sun-solaris2.10". >>>>> For bug reporting instructions, please see: >>>>> <http://www.gnu.org/software/gdb/bugs/>... >>>>> Reading symbols from >>> /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/bin/orterun...done. >>>>> (gdb) run -np 1 --host tyr init_finalize >>>>> Starting program: /usr/local/openmpi-1.9_64_cc/bin/mpiexec -np 1 --host > tyr >>> init_finalize >>>>> [Thread debugging using libthread_db enabled] >>>>> [New Thread 1 (LWP 1)] >>>>> [New LWP 2 ] >>>>> [tyr:00628] *** Process received signal *** >>>>> [tyr:00628] Signal: Bus Error (10) >>>>> [tyr:00628] Signal code: Invalid address alignment (1) >>>>> [tyr:00628] Failing at address: ffffffff7fffd73c >>>>> >>> > /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:opal_back >>> trace_print+0x1c >>>>> >>> > /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:0x1a4960 >>>>> /lib/sparcv9/libc.so.1:0xd8b98 >>>>> /lib/sparcv9/libc.so.1:0xcc70c >>>>> /lib/sparcv9/libc.so.1:0xcc918 >>>>> >>> > /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:opal_dss_ >>> unpack_int64+0xf4 [ Signal >>>>> 2096416616 (?)] >>>>> >>> > /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:opal_dss_ >>> unpack_buffer+0x168 >>>>> >>> > /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:opal_dss_ >>> unpack+0x24c >>>>> >>> > /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/openmpi/mca_pmix_native.so:0x1 >>> 4e10 >>>>> >>> > /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libmpi.so.0.0.0:ompi_mpi_init+ >>> 0xd18 >>>>> >>> > /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libmpi.so.0.0.0:MPI_Init+0x26c >>>>> /home/fd1026/SunOS/sparc/bin/init_finalize:main+0x10 >>>>> /home/fd1026/SunOS/sparc/bin/init_finalize:_start+0x12c >>>>> [tyr:00628] *** End of error message *** >>>>> -------------------------------------------------------------------------- >>>>> mpiexec noticed that process rank 0 with PID 628 on node tyr exited on >>> signal 10 (Bus Error). >>>>> -------------------------------------------------------------------------- >>>>> [ >>>> >>> >> >> > <accept_failed.truss.tar.gz>