Just pushed the fix to master today - not required for 2.x.
> On Jun 9, 2016, at 10:44 AM, Siegmar Gross
> <siegmar.gr...@informatik.hs-fulda.de> wrote:
>
> Hi Ralph,
>
> Am 08.06.2016 um 21:19 schrieb rhc54:
>> Closed #1771 <https://github.com/open-mpi/ompi/issues/1771> via #1772
>> <https://github.com/open-mpi/ompi/pull/1772>.
>
> Thank you very much for your help. Now I have new problems
> with the same program on my Sparc and x86_64 Solaris machines.
>
> tyr hello_1 106 ompi_info | grep -e "OPAL repo revision:" -e "C compiler
> absolute:"
> OPAL repo revision: dev-4251-g1f651d1
> C compiler absolute: /usr/local/gcc-5.1.0/bin/gcc
>
> tyr hello_1 107 mpiexec -np 2 hello_1_mpi
> [tyr:08647] *** Process received signal ***
> [tyr:08647] Signal: Bus Error (10)
> [tyr:08647] Signal code: Invalid address alignment (1)
> [tyr:08647] Failing at address: 1001c94eb
> /export2/prog/SunOS_sparc/openmpi-master_64_gcc/lib64/libopen-pal.so.0.0.0:opal_backtrace_print+0x2c
> /export2/prog/SunOS_sparc/openmpi-master_64_gcc/lib64/libopen-pal.so.0.0.0:0xdefa4
> /lib/sparcv9/libc.so.1:0xd8b98
> /lib/sparcv9/libc.so.1:0xcc70c
> /lib/sparcv9/libc.so.1:0xcc918
> /export2/prog/SunOS_sparc/openmpi-master_64_gcc/lib64/openmpi/mca_pmix_pmix114.so:0x8c800
> [ Signal 10 (BUS)]
> /export2/prog/SunOS_sparc/openmpi-master_64_gcc/lib64/openmpi/mca_pmix_pmix114.so:0x8cba4
> /export2/prog/SunOS_sparc/openmpi-master_64_gcc/lib64/openmpi/mca_pmix_pmix114.so:0x8de10
> /export2/prog/SunOS_sparc/openmpi-master_64_gcc/lib64/libopen-pal.so.0.0.0:0xee62c
> /export2/prog/SunOS_sparc/openmpi-master_64_gcc/lib64/libopen-pal.so.0.0.0:0xee948
> /export2/prog/SunOS_sparc/openmpi-master_64_gcc/lib64/libopen-pal.so.0.0.0:opal_libevent2022_event_base_loop+0x310
> /export2/prog/SunOS_sparc/openmpi-master_64_gcc/lib64/openmpi/mca_pmix_pmix114.so:0x4b7f4
> /lib/sparcv9/libc.so.1:0xd8a6c
> [tyr:08647] *** End of error message ***
> Bus error
>
> tyr hello_1 108 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec
> GNU gdb (GDB) 7.6.1
> ...
> Reading symbols from
> /export2/prog/SunOS_sparc/openmpi-master_64_gcc/bin/orterun...done.
> (gdb) set args -np 2 hello_1_mpi
> (gdb) r
> Starting program: /usr/local/openmpi-master_64_gcc/bin/mpiexec -np 2
> hello_1_mpi
> [Thread debugging using libthread_db enabled]
> [New Thread 1 (LWP 1)]
> [New LWP 2 ]
> [New LWP 3 ]
> [New LWP 4 ]
> [New LWP 5 ]
> [New Thread 3 (LWP 3)]
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 3 (LWP 3)]
> 0xffffffff7988c800 in parse_connect_ack (msg=0x1001c97bb "", len=13,
> nspace=0xffffffff797fbac0,
> rank=0xffffffff797fbaa8, version=0xffffffff797fbab8,
> cred=0xffffffff797fbab0)
> at
> ../../../../../../openmpi-dev-4251-g1f651d1/opal/mca/pmix/pmix114/pmix/src/server/pmix_server_listener.c:332
> 332 *rank = *(int *)msg;
> (gdb) bt
> #0 0xffffffff7988c800 in parse_connect_ack (msg=0x1001c97bb "", len=13,
> nspace=0xffffffff797fbac0, rank=0xffffffff797fbaa8,
> version=0xffffffff797fbab8,
> cred=0xffffffff797fbab0)
> at
> ../../../../../../openmpi-dev-4251-g1f651d1/opal/mca/pmix/pmix114/pmix/src/server/pmix_server_listener.c:332
> #1 0xffffffff7988cbac in pmix_server_authenticate (sd=29,
> out_rank=0xffffffff797fbc0c,
> peer=0xffffffff797fbc10)
> at
> ../../../../../../openmpi-dev-4251-g1f651d1/opal/mca/pmix/pmix114/pmix/src/server/pmix_server_listener.c:403
> #2 0xffffffff7988de18 in connection_handler (sd=-1, flags=4,
> cbdata=0x1001cdc30)
> at
> ../../../../../../openmpi-dev-4251-g1f651d1/opal/mca/pmix/pmix114/pmix/src/server/pmix_server_listener.c:564
> #3 0xffffffff7ecee634 in event_process_active_single_queue ()
> from /usr/local/openmpi-master_64_gcc/lib64/libopen-pal.so.0
> #4 0xffffffff7ecee950 in event_process_active ()
> from /usr/local/openmpi-master_64_gcc/lib64/libopen-pal.so.0
> #5 0xffffffff7ecef22c in opal_libevent2022_event_base_loop ()
> from /usr/local/openmpi-master_64_gcc/lib64/libopen-pal.so.0
> #6 0xffffffff7984b7fc in progress_engine (obj=0x1001bb0b0)
> at
> ../../../../../../openmpi-dev-4251-g1f651d1/opal/mca/pmix/pmix114/pmix/src/util/progress_threads.c:52
> #7 0xffffffff7d9d8a74 in _lwp_start () from /lib/sparcv9/libc.so.1
> #8 0xffffffff7d9d8a74 in _lwp_start () from /lib/sparcv9/libc.so.1
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> (gdb) print msg
> $1 = 0x1001c97bb ""
> (gdb) print (int *)msg
> $2 = (int *) 0x1001c97bb
> (gdb) print *(int *)msg
> $3 = 0
> (gdb)
>
>
>
>
>
>
> sunpc1 fd1026 102 ompi_info | grep -e "OPAL repo revision:" -e "C compiler
> absolute:"
> OPAL repo revision: dev-4251-g1f651d1
> C compiler absolute: /usr/local/gcc-5.1.0/bin/gcc
>
> sunpc1 fd1026 103 mpiexec -np 2 hello_1_mpi
> [sunpc1:27530] PMIX ERROR: NOT-SUPPORTED in file
> ../../../../../../openmpi-dev-4251-g1f651d1/opal/mca/pmix/pmix114/pmix/src/server/pmix_server_listener.c
> at line 540
> [sunpc1:27532] PMIX ERROR: UNREACHABLE in file
> ../../../../../../openmpi-dev-4251-g1f651d1/opal/mca/pmix/pmix114/pmix/src/client/pmix_client.c
> at line 983
> [sunpc1:27532] PMIX ERROR: UNREACHABLE in file
> ../../../../../../openmpi-dev-4251-g1f651d1/opal/mca/pmix/pmix114/pmix/src/client/pmix_client.c
> at line 199
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> pmix init failed
> --> Returned value Unreachable (-12) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ess_init failed
> --> Returned value Unreachable (-12) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> ompi_mpi_init: ompi_rte_init failed
> --> Returned "Unreachable" (-12) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> *** and potentially your MPI job)
> [sunpc1:27532] Local abort before MPI_INIT completed completed successfully,
> but am not able to aggregate error messages, and not able to guarantee that
> all other processes were killed!
> -------------------------------------------------------
> Primary job terminated normally, but 1 process returned
> a non-zero exit code. Per user-direction, the job has been aborted.
> -------------------------------------------------------
> --------------------------------------------------------------------------
> mpiexec detected that one or more processes exited with non-zero status, thus
> causing
> the job to be terminated. The first process to do so was:
>
> Process name: [[26724,1],0]
> Exit code: 1
> --------------------------------------------------------------------------
> sunpc1 fd1026 104
>
>
>
> Hopefully you can solve the problem as well. Thank you very much
> for any help in advance.
>
>
> Kind regards
>
> Siegmar
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/06/29421.php