Hi,

at first, i suggest you decide which Open MPI version you want to use.

the most up to date versions are 2.0.3 and 2.1.1

then please provide all the info Jeff previously requested.

ideally, you would write a simple and standalone program that exhibits 
the issue, so we can reproduce and investigate it.

if not, i suggest you use an other MPI library (mvapich, Intel MPI or 
any mpich-based MPI) and see if the issue is still there.

if the double free error still occurs, it is very likely the issue comes 
from your application and not the MPI library.

if you have a parallel debugger such as allinea ddt, then you can run 
your program under the debugger with thorough memory debugging. the 
program will halt when the memory corruption occurs, and this will be a 
hint

(app issue vs mpi issue).

if you did not configure Open MPI with --enable-debug, then please do so 
and try again,

you will increase the likelyhood of trapping such a memory corruption 
error earlier, and you will get a clean Open MPI stack trace if a crash 
occurs.

you might also want to try to

mpirun --mca btl tcp,self ...

and see if you get a different behavior.

this will only use TCP for inter process communication, and this is way 
easier to debug than shared memory or rdma

Cheers,

Gilles

----- Original Message -----

    Hello,
              I found a thread with Intel MPI(although I am using 
gfortran 4.8.5 and OpenMPI 2.1.1) - 
https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/564266
 but the error the OP gets is the same as mine

    *** glibc detected *** ./a.out: double free or corruption (!prev): 
0x00007fc6d0000c80 ***
    04  ======= Backtrace: =========
    05  /lib64/libc.so.6[0x3411e75e66]
    06 /lib64/libc.so.6[0x3411e789b3]

    So the explanation given in that post is this -
    "From their examination our Development team concluded the 
underlying problem with openmpi 1.8.6 resulted from mixing out-of-date/
incompatible Fortran RTLs. In short, there were older static Fortran RTL 
bodies incorporated in the openmpi library that when mixed with newer 
Fortran RTL led to the failure. They found the issue is resolved in the 
newer openmpi-1.10.1rc2 and recommend resolving requires using a newer 
openmpi release with our 15.0 (or newer) release." Could this be 
possible with my version as well ?


    I am willing to debug this provided I am given some clue on how to 
approach my problem. At the moment I am unable to proceed further and 
the only thing I can add is I ran tests with the sequential form of my 
application and it is much slower although I am using shared memory and 
all the cores are in the same machine.

    Best regards,
    Ashwin.





    On Tue, Jun 13, 2017 at 5:52 PM, ashwin .D <winas...@gmail.com> 
wrote:

        Also when I try to build and run a make check I get these errors 
- Am I clear to proceed or is my installation broken ? This is on Ubuntu 
16.04 LTS.

        ==================================================
           Open MPI 2.1.1: test/datatype/test-suite.log
        ==================================================

        # TOTAL: 9
        # PASS:  8
        # SKIP:  0
        # XFAIL: 0
        # FAIL:  1
        # XPASS: 0
        # ERROR: 0

        .. contents:: :depth: 2

        FAIL: external32
        ================

        /home/t/openmpi-2.1.1/test/datatype/.libs/lt-external32: symbol 
lookup error: /home/openmpi-2.1.1/test/datatype/.libs/lt-external32: 
undefined symbol: ompi_datatype_pack_external_size
        FAIL external32 (exit status:

        On Tue, Jun 13, 2017 at 5:24 PM, ashwin .D <winas...@gmail.com> 
wrote:

            Hello,
                      I am using OpenMPI 2.0.0 with a computational 
fluid dynamics software and I am encountering a series of errors when 
running this with mpirun. This is my lscpu output

            CPU(s):                4
            On-line CPU(s) list:   0-3
            Thread(s) per core:    2
            Core(s) per socket:    2
            Socket(s):             1 and I am running OpenMPI's mpirun 
in the following

            way

            mpirun -np 4  cfd_software



            and I get double free or corruption every single time.



            I have two questions -



            1) I am unable to capture the standard error that mpirun 
throws in a file

            How can I go about capturing the standard error of mpirun ? 

            2) Has this error i.e. double free or corruption been 
reported by others ? Is there a Is a 

            bug fix available ?



            Regards,

            Ashwin.





_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to