Just pushed the fix to master today - not required for 2.x.

> On Jun 9, 2016, at 10:44 AM, Siegmar Gross 
> <siegmar.gr...@informatik.hs-fulda.de> wrote:
> 
> Hi Ralph,
> 
> Am 08.06.2016 um 21:19 schrieb rhc54:
>> Closed #1771 <https://github.com/open-mpi/ompi/issues/1771> via #1772 
>> <https://github.com/open-mpi/ompi/pull/1772>.
> 
> Thank you very much for your help. Now I have new problems
> with the same program on my Sparc and x86_64 Solaris machines.
> 
> tyr hello_1 106 ompi_info | grep -e "OPAL repo revision:" -e "C compiler 
> absolute:"
>      OPAL repo revision: dev-4251-g1f651d1
>     C compiler absolute: /usr/local/gcc-5.1.0/bin/gcc
> 
> tyr hello_1 107 mpiexec -np 2 hello_1_mpi
> [tyr:08647] *** Process received signal ***
> [tyr:08647] Signal: Bus Error (10)
> [tyr:08647] Signal code: Invalid address alignment (1)
> [tyr:08647] Failing at address: 1001c94eb
> /export2/prog/SunOS_sparc/openmpi-master_64_gcc/lib64/libopen-pal.so.0.0.0:opal_backtrace_print+0x2c
> /export2/prog/SunOS_sparc/openmpi-master_64_gcc/lib64/libopen-pal.so.0.0.0:0xdefa4
> /lib/sparcv9/libc.so.1:0xd8b98
> /lib/sparcv9/libc.so.1:0xcc70c
> /lib/sparcv9/libc.so.1:0xcc918
> /export2/prog/SunOS_sparc/openmpi-master_64_gcc/lib64/openmpi/mca_pmix_pmix114.so:0x8c800
>  [ Signal 10 (BUS)]
> /export2/prog/SunOS_sparc/openmpi-master_64_gcc/lib64/openmpi/mca_pmix_pmix114.so:0x8cba4
> /export2/prog/SunOS_sparc/openmpi-master_64_gcc/lib64/openmpi/mca_pmix_pmix114.so:0x8de10
> /export2/prog/SunOS_sparc/openmpi-master_64_gcc/lib64/libopen-pal.so.0.0.0:0xee62c
> /export2/prog/SunOS_sparc/openmpi-master_64_gcc/lib64/libopen-pal.so.0.0.0:0xee948
> /export2/prog/SunOS_sparc/openmpi-master_64_gcc/lib64/libopen-pal.so.0.0.0:opal_libevent2022_event_base_loop+0x310
> /export2/prog/SunOS_sparc/openmpi-master_64_gcc/lib64/openmpi/mca_pmix_pmix114.so:0x4b7f4
> /lib/sparcv9/libc.so.1:0xd8a6c
> [tyr:08647] *** End of error message ***
> Bus error
> 
> tyr hello_1 108 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec
> GNU gdb (GDB) 7.6.1
> ...
> Reading symbols from 
> /export2/prog/SunOS_sparc/openmpi-master_64_gcc/bin/orterun...done.
> (gdb) set args -np 2 hello_1_mpi
> (gdb) r
> Starting program: /usr/local/openmpi-master_64_gcc/bin/mpiexec -np 2 
> hello_1_mpi
> [Thread debugging using libthread_db enabled]
> [New Thread 1 (LWP 1)]
> [New LWP    2        ]
> [New LWP    3        ]
> [New LWP    4        ]
> [New LWP    5        ]
> [New Thread 3 (LWP 3)]
> 
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 3 (LWP 3)]
> 0xffffffff7988c800 in parse_connect_ack (msg=0x1001c97bb "", len=13, 
> nspace=0xffffffff797fbac0,
>    rank=0xffffffff797fbaa8, version=0xffffffff797fbab8, 
> cred=0xffffffff797fbab0)
>    at 
> ../../../../../../openmpi-dev-4251-g1f651d1/opal/mca/pmix/pmix114/pmix/src/server/pmix_server_listener.c:332
> 332             *rank = *(int *)msg;
> (gdb) bt
> #0  0xffffffff7988c800 in parse_connect_ack (msg=0x1001c97bb "", len=13,
>    nspace=0xffffffff797fbac0, rank=0xffffffff797fbaa8, 
> version=0xffffffff797fbab8,
>    cred=0xffffffff797fbab0)
>    at 
> ../../../../../../openmpi-dev-4251-g1f651d1/opal/mca/pmix/pmix114/pmix/src/server/pmix_server_listener.c:332
> #1  0xffffffff7988cbac in pmix_server_authenticate (sd=29, 
> out_rank=0xffffffff797fbc0c,
>    peer=0xffffffff797fbc10)
>    at 
> ../../../../../../openmpi-dev-4251-g1f651d1/opal/mca/pmix/pmix114/pmix/src/server/pmix_server_listener.c:403
> #2  0xffffffff7988de18 in connection_handler (sd=-1, flags=4, 
> cbdata=0x1001cdc30)
>    at 
> ../../../../../../openmpi-dev-4251-g1f651d1/opal/mca/pmix/pmix114/pmix/src/server/pmix_server_listener.c:564
> #3  0xffffffff7ecee634 in event_process_active_single_queue ()
>   from /usr/local/openmpi-master_64_gcc/lib64/libopen-pal.so.0
> #4  0xffffffff7ecee950 in event_process_active ()
>   from /usr/local/openmpi-master_64_gcc/lib64/libopen-pal.so.0
> #5  0xffffffff7ecef22c in opal_libevent2022_event_base_loop ()
>   from /usr/local/openmpi-master_64_gcc/lib64/libopen-pal.so.0
> #6  0xffffffff7984b7fc in progress_engine (obj=0x1001bb0b0)
>    at 
> ../../../../../../openmpi-dev-4251-g1f651d1/opal/mca/pmix/pmix114/pmix/src/util/progress_threads.c:52
> #7  0xffffffff7d9d8a74 in _lwp_start () from /lib/sparcv9/libc.so.1
> #8  0xffffffff7d9d8a74 in _lwp_start () from /lib/sparcv9/libc.so.1
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> (gdb) print msg
> $1 = 0x1001c97bb ""
> (gdb) print (int *)msg
> $2 = (int *) 0x1001c97bb
> (gdb) print *(int *)msg
> $3 = 0
> (gdb)
> 
> 
> 
> 
> 
> 
> sunpc1 fd1026 102 ompi_info | grep -e "OPAL repo revision:" -e "C compiler 
> absolute:"
>      OPAL repo revision: dev-4251-g1f651d1
>     C compiler absolute: /usr/local/gcc-5.1.0/bin/gcc
> 
> sunpc1 fd1026 103 mpiexec -np 2 hello_1_mpi
> [sunpc1:27530] PMIX ERROR: NOT-SUPPORTED in file 
> ../../../../../../openmpi-dev-4251-g1f651d1/opal/mca/pmix/pmix114/pmix/src/server/pmix_server_listener.c
>  at line 540
> [sunpc1:27532] PMIX ERROR: UNREACHABLE in file 
> ../../../../../../openmpi-dev-4251-g1f651d1/opal/mca/pmix/pmix114/pmix/src/client/pmix_client.c
>  at line 983
> [sunpc1:27532] PMIX ERROR: UNREACHABLE in file 
> ../../../../../../openmpi-dev-4251-g1f651d1/opal/mca/pmix/pmix114/pmix/src/client/pmix_client.c
>  at line 199
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
> 
>  pmix init failed
>  --> Returned value Unreachable (-12) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
> 
>  orte_ess_init failed
>  --> Returned value Unreachable (-12) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
> 
>  ompi_mpi_init: ompi_rte_init failed
>  --> Returned "Unreachable" (-12) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***    and potentially your MPI job)
> [sunpc1:27532] Local abort before MPI_INIT completed completed successfully, 
> but am not able to aggregate error messages, and not able to guarantee that 
> all other processes were killed!
> -------------------------------------------------------
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code. Per user-direction, the job has been aborted.
> -------------------------------------------------------
> --------------------------------------------------------------------------
> mpiexec detected that one or more processes exited with non-zero status, thus 
> causing
> the job to be terminated. The first process to do so was:
> 
>  Process name: [[26724,1],0]
>  Exit code:    1
> --------------------------------------------------------------------------
> sunpc1 fd1026 104
> 
> 
> 
> Hopefully you can solve the problem as well. Thank you very much
> for any help in advance.
> 
> 
> Kind regards
> 
> Siegmar
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/06/29421.php

Reply via email to