[OMPI users] Seg error when using v5.0.1
Hello, I upgraded one of the systems to v5.0.1 and have compiled everything exactly as dozens of previous times with v4. I wasn't expecting any issue (and the compilations didn't report anything out of ordinary) but running several apps has resulted in error messages such as: Backtrace for this error: #0 0x7f7c9571f51f in ??? at ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0 #1 0x7f7c957823fe in __GI___libc_free at ./malloc/malloc.c:3368 #2 0x7f7c93a635c3 in ??? #3 0x7f7c95f84048 in ??? #4 0x7f7c95f1cef1 in ??? #5 0x7f7c95e34b7b in ??? #6 0x6e05be in ??? #7 0x6e58d7 in ??? #8 0x405d2c in ??? #9 0x7f7c95706d8f in __libc_start_call_main at ../sysdeps/nptl/libc_start_call_main.h:58 #10 0x7f7c95706e3f in __libc_start_main_impl at ../csu/libc-start.c:392 #11 0x405d64 in ??? #12 0x in ??? OS is Ubuntu 22.04, OpenMPI was built with GCC13.2, and before building OpenMPI, I had previously built the hwloc (2.10.0) library at /usr/lib/x86_64-linux-gnu. Maybe I'm missing something pretty basic, but the problem seems to be related to memory allocation. Thanks.
Re: [OMPI users] Seg error when using v5.0.1
Hello, This looks like memory corruption. Do you have more details on what your app is doing? I don't see any MPI calls inside the call stack. Could you rebuild Open MPI with debug information enabled (by adding `--enable-debug` to configure)? If this error occurs on singleton runs (1 process) then you can easily attach gdb to it to get a better stack trace. Also, valgrind may help pin down the problem by telling you which memory block is being free'd here. Thanks Joseph On 1/30/24 07:41, afernandez via users wrote: Hello, I upgraded one of the systems to v5.0.1 and have compiled everything exactly as dozens of previous times with v4. I wasn't expecting any issue (and the compilations didn't report anything out of ordinary) but running several apps has resulted in error messages such as: /Backtrace for this error:/ /#0 0x7f7c9571f51f in ???/ / at ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0/ /#1 0x7f7c957823fe in __GI___libc_free/ / at ./malloc/malloc.c:3368/ /#2 0x7f7c93a635c3 in ???/ /#3 0x7f7c95f84048 in ???/ /#4 0x7f7c95f1cef1 in ???/ /#5 0x7f7c95e34b7b in ???/ /#6 0x6e05be in ???/ /#7 0x6e58d7 in ???/ /#8 0x405d2c in ???/ /#9 0x7f7c95706d8f in __libc_start_call_main/ / at ../sysdeps/nptl/libc_start_call_main.h:58/ /#10 0x7f7c95706e3f in __libc_start_main_impl/ / at ../csu/libc-start.c:392/ /#11 0x405d64 in ???/ /#12 0x in ???/ OS is Ubuntu 22.04, OpenMPI was built with GCC13.2, and before building OpenMPI, I had previously built the hwloc (2.10.0) library at /usr/lib/x86_64-linux-gnu. Maybe I'm missing something pretty basic, but the problem seems to be related to memory allocation. Thanks.
Re: [OMPI users] Seg error when using v5.0.1
Hi Joseph, It's happening with several apps including WRF. I was trying to find a quick answer or fix but it seems that I'll have to recompile it in debug mode. Will report back with the extra info. Thanks. Joseph Schuchart via users wrote: Hello, This looks like memory corruption. Do you have more details on what your app is doing? I don't see any MPI calls inside the call stack. Could you rebuild Open MPI with debug information enabled (by adding `--enable-debug` to configure)? If this error occurs on singleton runs (1 process) then you can easily attach gdb to it to get a better stack trace. Also, valgrind may help pin down the problem by telling you which memory block is being free'd here. Thanks Joseph On 1/30/24 07:41, afernandez via users wrote: Hello, I upgraded one of the systems to v5.0.1 and have compiled everything > exactly as dozens of previous times with v4. I wasn't expecting any > issue (and the compilations didn't report anything out of ordinary) > but running several apps has resulted in error messages such as: /Backtrace for this error:/ /#0 0x7f7c9571f51f in ???/ / at ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0/ /#1 0x7f7c957823fe in __GI___libc_free/ / at ./malloc/malloc.c:3368/ /#2 0x7f7c93a635c3 in ???/ /#3 0x7f7c95f84048 in ???/ /#4 0x7f7c95f1cef1 in ???/ /#5 0x7f7c95e34b7b in ???/ /#6 0x6e05be in ???/ /#7 0x6e58d7 in ???/ /#8 0x405d2c in ???/ /#9 0x7f7c95706d8f in __libc_start_call_main/ / at ../sysdeps/nptl/libc_start_call_main.h:58/ /#10 0x7f7c95706e3f in __libc_start_main_impl/ / at ../csu/libc-start.c:392/ /#11 0x405d64 in ???/ /#12 0x in ???/ OS is Ubuntu 22.04, OpenMPI was built with GCC13.2, and before > building OpenMPI, I had previously built the hwloc (2.10.0) library at > /usr/lib/x86_64-linux-gnu. Maybe I'm missing something pretty basic, > but the problem seems to be related to memory allocation. Thanks.