Hi, at first, i suggest you decide which Open MPI version you want to use.
the most up to date versions are 2.0.3 and 2.1.1 then please provide all the info Jeff previously requested. ideally, you would write a simple and standalone program that exhibits the issue, so we can reproduce and investigate it. if not, i suggest you use an other MPI library (mvapich, Intel MPI or any mpich-based MPI) and see if the issue is still there. if the double free error still occurs, it is very likely the issue comes from your application and not the MPI library. if you have a parallel debugger such as allinea ddt, then you can run your program under the debugger with thorough memory debugging. the program will halt when the memory corruption occurs, and this will be a hint (app issue vs mpi issue). if you did not configure Open MPI with --enable-debug, then please do so and try again, you will increase the likelyhood of trapping such a memory corruption error earlier, and you will get a clean Open MPI stack trace if a crash occurs. you might also want to try to mpirun --mca btl tcp,self ... and see if you get a different behavior. this will only use TCP for inter process communication, and this is way easier to debug than shared memory or rdma Cheers, Gilles ----- Original Message ----- Hello, I found a thread with Intel MPI(although I am using gfortran 4.8.5 and OpenMPI 2.1.1) - https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/564266 but the error the OP gets is the same as mine *** glibc detected *** ./a.out: double free or corruption (!prev): 0x00007fc6d0000c80 *** 04 ======= Backtrace: ========= 05 /lib64/libc.so.6[0x3411e75e66] 06 /lib64/libc.so.6[0x3411e789b3] So the explanation given in that post is this - "From their examination our Development team concluded the underlying problem with openmpi 1.8.6 resulted from mixing out-of-date/ incompatible Fortran RTLs. In short, there were older static Fortran RTL bodies incorporated in the openmpi library that when mixed with newer Fortran RTL led to the failure. They found the issue is resolved in the newer openmpi-1.10.1rc2 and recommend resolving requires using a newer openmpi release with our 15.0 (or newer) release." Could this be possible with my version as well ? I am willing to debug this provided I am given some clue on how to approach my problem. At the moment I am unable to proceed further and the only thing I can add is I ran tests with the sequential form of my application and it is much slower although I am using shared memory and all the cores are in the same machine. Best regards, Ashwin. On Tue, Jun 13, 2017 at 5:52 PM, ashwin .D <winas...@gmail.com> wrote: Also when I try to build and run a make check I get these errors - Am I clear to proceed or is my installation broken ? This is on Ubuntu 16.04 LTS. ================================================== Open MPI 2.1.1: test/datatype/test-suite.log ================================================== # TOTAL: 9 # PASS: 8 # SKIP: 0 # XFAIL: 0 # FAIL: 1 # XPASS: 0 # ERROR: 0 .. contents:: :depth: 2 FAIL: external32 ================ /home/t/openmpi-2.1.1/test/datatype/.libs/lt-external32: symbol lookup error: /home/openmpi-2.1.1/test/datatype/.libs/lt-external32: undefined symbol: ompi_datatype_pack_external_size FAIL external32 (exit status: On Tue, Jun 13, 2017 at 5:24 PM, ashwin .D <winas...@gmail.com> wrote: Hello, I am using OpenMPI 2.0.0 with a computational fluid dynamics software and I am encountering a series of errors when running this with mpirun. This is my lscpu output CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 and I am running OpenMPI's mpirun in the following way mpirun -np 4 cfd_software and I get double free or corruption every single time. I have two questions - 1) I am unable to capture the standard error that mpirun throws in a file How can I go about capturing the standard error of mpirun ? 2) Has this error i.e. double free or corruption been reported by others ? Is there a Is a bug fix available ? Regards, Ashwin.
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users