Hello all, I have had no answers regarding the trouble (OpenMPI bug ?) I evidenced when combining OpenMPI and valgrind.
I tried it with a newer version of OpenMPI, and the problems persist, with new, even more worrying, error messages being displayed : ==32142== Warning: client syscall munmap tried to modify addresses 0xFFFFFFFF-0xFFE (but this happens for all the programs I tried) The original error messages, which are still here, were the following : ==32143== Source and destination overlap in memcpy(0x4A73DA8, 0x4A73DB0, 16) ==32143== at 0x40236C9: memcpy (mc_replace_strmem.c:402) ==32143== by 0x407C9DC: ompi_ddt_copy_content_same_ddt (dt_copy.c:171) ==32143== by 0x512EA61: ompi_coll_tuned_allgather_intra_bruck (coll_tuned_allgather.c:193) ==32143== by 0x5126D90: ompi_coll_tuned_allgather_intra_dec_fixed (coll_tuned_decision_fixed.c:562) ==32143== by 0x408986A: PMPI_Allgather (pallgather.c:101) ==32143== by 0x80487D7: main (in /tmp/brol) I do not get this "memcpy" messages when running on 2 processors. I therefore assume it is a rounding problem wrt the number of procs. 1) The program ============== The program "brol.c" I am running is very simple : #include <stdio.h> #include <stdlib.h> #include <mpi.h> int main ( int argc, char * argv[]) { int procglbnbr; int proclocnum; int * dataloctab; int * dataglbtab; if (MPI_Init (&argc, &argv) != MPI_SUCCESS) exit (1); MPI_Comm_size (MPI_COMM_WORLD, &procglbnbr); MPI_Comm_rank (MPI_COMM_WORLD, &proclocnum); dataloctab = malloc (2 * (procglbnbr + 1) * sizeof (int)); dataglbtab = dataloctab + 2; dataloctab[0] = dataloctab[1] = proclocnum; if (MPI_Allgather (dataloctab, 2, MPI_INT, dataglbtab, 2, MPI_INT, MPI_COMM_WORLD) != MPI_SUCCESS) exit (1); MPI_Finalize (); return (0); } 2) Configuration ================ I compile it with : "mpicc brol.c -o brol" I run it with : "mpirun -np 3 valgrind ./brol" I do not get the "memcpy" messages when running on 2 processors. I therefore assume, as I said above, that it is a rounding problem. ompi_info says : Package: Open MPI pelegrin@brol Distribution Open MPI: 1.3.2rc1r21037 Open MPI SVN revision: r21037 Open MPI release date: Unreleased developer copy Open RTE: 1.3.2rc1r21037 Open RTE SVN revision: r21037 Open RTE release date: Unreleased developer copy OPAL: 1.3.2rc1r21037 OPAL SVN revision: r21037 OPAL release date: Unreleased developer copy Ident string: 1.3.2rc1r21037 Prefix: /usr/local Configured architecture: i686-pc-linux-gnu Configure host: brol Configured by: pelegrin Configured on: Sun Apr 19 20:53:17 CEST 2009 Configure host: brol Built by: pelegrin Built on: Sun Apr 19 21:05:30 CEST 2009 Built host: brol C bindings: yes C++ bindings: yes Fortran77 bindings: yes (all) Fortran90 bindings: yes Fortran90 bindings size: small C compiler: gcc C compiler absolute: /usr/bin/gcc C++ compiler: g++ C++ compiler absolute: /usr/bin/g++ Fortran77 compiler: gfortran Fortran77 compiler abs: /usr/bin/gfortran Fortran90 compiler: gfortran Fortran90 compiler abs: /usr/bin/gfortran C profiling: yes C++ profiling: yes Fortran77 profiling: yes Fortran90 profiling: yes C++ exceptions: no Thread support: posix (mpi: yes, progress: no) Sparse Groups: no Internal debug support: yes MPI parameter check: runtime Memory profiling support: no Memory debugging support: yes libltdl support: yes Heterogeneous support: no mpirun default --prefix: no MPI I/O support: yes MPI_WTIME support: gettimeofday Symbol visibility support: yes FT Checkpoint support: no (checkpoint thread: no) MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.3.2) MCA memchecker: valgrind (MCA v2.0, API v2.0, Component v1.3.2) MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.3.2) MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.3.2) MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.3.2) MCA carto: file (MCA v2.0, API v2.0, Component v1.3.2) MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.3.2) MCA timer: linux (MCA v2.0, API v2.0, Component v1.3.2) MCA installdirs: env (MCA v2.0, API v2.0, Component v1.3.2) MCA installdirs: config (MCA v2.0, API v2.0, Component v1.3.2) MCA dpm: orte (MCA v2.0, API v2.0, Component v1.3.2) MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.3.2) MCA allocator: basic (MCA v2.0, API v2.0, Component v1.3.2) MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.3.2) MCA coll: basic (MCA v2.0, API v2.0, Component v1.3.2) MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.3.2) MCA coll: inter (MCA v2.0, API v2.0, Component v1.3.2) MCA coll: self (MCA v2.0, API v2.0, Component v1.3.2) MCA coll: sm (MCA v2.0, API v2.0, Component v1.3.2) MCA coll: sync (MCA v2.0, API v2.0, Component v1.3.2) MCA coll: tuned (MCA v2.0, API v2.0, Component v1.3.2) MCA io: romio (MCA v2.0, API v2.0, Component v1.3.2) MCA mpool: fake (MCA v2.0, API v2.0, Component v1.3.2) MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.3.2) MCA mpool: sm (MCA v2.0, API v2.0, Component v1.3.2) MCA pml: cm (MCA v2.0, API v2.0, Component v1.3.2) MCA pml: csum (MCA v2.0, API v2.0, Component v1.3.2) MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.3.2) MCA pml: v (MCA v2.0, API v2.0, Component v1.3.2) MCA bml: r2 (MCA v2.0, API v2.0, Component v1.3.2) MCA rcache: vma (MCA v2.0, API v2.0, Component v1.3.2) MCA btl: self (MCA v2.0, API v2.0, Component v1.3.2) MCA btl: sm (MCA v2.0, API v2.0, Component v1.3.2) MCA btl: tcp (MCA v2.0, API v2.0, Component v1.3.2) MCA topo: unity (MCA v2.0, API v2.0, Component v1.3.2) MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.3.2) MCA osc: rdma (MCA v2.0, API v2.0, Component v1.3.2) MCA iof: hnp (MCA v2.0, API v2.0, Component v1.3.2) MCA iof: orted (MCA v2.0, API v2.0, Component v1.3.2) MCA iof: tool (MCA v2.0, API v2.0, Component v1.3.2) MCA oob: tcp (MCA v2.0, API v2.0, Component v1.3.2) MCA odls: default (MCA v2.0, API v2.0, Component v1.3.2) MCA ras: slurm (MCA v2.0, API v2.0, Component v1.3.2) MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.3.2) MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.3.2) MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.3.2) MCA rml: oob (MCA v2.0, API v2.0, Component v1.3.2) MCA routed: binomial (MCA v2.0, API v2.0, Component v1.3.2) MCA routed: direct (MCA v2.0, API v2.0, Component v1.3.2) MCA routed: linear (MCA v2.0, API v2.0, Component v1.3.2) MCA plm: rsh (MCA v2.0, API v2.0, Component v1.3.2) MCA plm: slurm (MCA v2.0, API v2.0, Component v1.3.2) MCA filem: rsh (MCA v2.0, API v2.0, Component v1.3.2) MCA errmgr: default (MCA v2.0, API v2.0, Component v1.3.2) MCA ess: env (MCA v2.0, API v2.0, Component v1.3.2) MCA ess: hnp (MCA v2.0, API v2.0, Component v1.3.2) MCA ess: singleton (MCA v2.0, API v2.0, Component v1.3.2) MCA ess: slurm (MCA v2.0, API v2.0, Component v1.3.2) MCA ess: tool (MCA v2.0, API v2.0, Component v1.3.2) MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.3.2) MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.3.2) I configured OpenMPI with : ./configure --enable-debug --enable-mem-debug --enable-mpi-threads --enable-memchecker --with-valgrind=/usr 3) Messages =========== In addition to the "memcpy" message, I also get a bunch of strange messages. Some excerpts : ==32141== Conditional jump or move depends on uninitialised value(s) ==32141== at 0x5005A03: mca_mpool_sm_alloc (mpool_sm_module.c:79) ==32141== by 0x40393E8: ompi_free_list_grow (ompi_free_list.c:198) ==32141== by 0x403926D: ompi_free_list_init_ex_new (ompi_free_list.c:163) ==32141== by 0x506CEFE: ompi_free_list_init_new (ompi_free_list.h:169) ==32141== by 0x506CD67: sm_btl_first_time_init (btl_sm.c:333) ==32141== by 0x506D1E2: mca_btl_sm_add_procs (btl_sm.c:484) ==32141== by 0x5062433: mca_bml_r2_add_procs (bml_r2.c:206) ==32141== by 0x50427AE: mca_pml_ob1_add_procs (pml_ob1.c:308) ==32141== by 0x4067F0E: ompi_mpi_init (ompi_mpi_init.c:667) ==32141== by 0x40A4242: PMPI_Init (pinit.c:80) ==32141== by 0x8048733: main (in /tmp/brol) ==32141== Conditional jump or move depends on uninitialised value(s) ==32141== at 0x5005A03: mca_mpool_sm_alloc (mpool_sm_module.c:79) ==32141== by 0x506D4D7: sm_fifo_init (btl_sm.h:213) ==32141== by 0x506D2D0: mca_btl_sm_add_procs (btl_sm.c:510) ==32141== by 0x5062433: mca_bml_r2_add_procs (bml_r2.c:206) ==32141== by 0x50427AE: mca_pml_ob1_add_procs (pml_ob1.c:308) ==32141== by 0x4067F0E: ompi_mpi_init (ompi_mpi_init.c:667) ==32141== by 0x40A4242: PMPI_Init (pinit.c:80) ==32141== by 0x8048733: main (in /tmp/brol) Thanks in advance, f.p.