On Apr 22, 2009, at 7:35 PM, François PELLEGRINI wrote:

I have had no answers regarding the trouble (OpenMPI bug ?)
I evidenced when combining OpenMPI and valgrind.


Sorry for the delay in getting back to you; there are so many mails and only so many hours in the day... :-(

I tried it with a newer version of OpenMPI, and the problems
persist, with new, even more worrying, error messages being displayed :

==32142== Warning: client syscall munmap tried to modify addresses 0xFFFFFFFF-0xFFE

(but this happens for all the programs I tried)

The original error messages, which are still here, were the
following :

==32143== Source and destination overlap in memcpy(0x4A73DA8, 0x4A73DB0, 16)
==32143==    at 0x40236C9: memcpy (mc_replace_strmem.c:402)
==32143== by 0x407C9DC: ompi_ddt_copy_content_same_ddt (dt_copy.c: 171)
==32143==    by 0x512EA61: ompi_coll_tuned_allgather_intra_bruck
(coll_tuned_allgather.c:193)
==32143==    by 0x5126D90: ompi_coll_tuned_allgather_intra_dec_fixed
(coll_tuned_decision_fixed.c:562)
==32143==    by 0x408986A: PMPI_Allgather (pallgather.c:101)
==32143==    by 0x80487D7: main (in /tmp/brol)

I do not get this "memcpy" messages when running on 2 processors.
I therefore assume it is a rounding problem wrt the number of procs.


Good.  This is possibly related to a post from last night:

    http://www.open-mpi.org/community/lists/users/2009/04/9138.php.

Some of the valgrind warnings are unavoidable, unfortunately -- e.g., those within system calls. Note that you *can* avoid the valgrind warnings in PLPA (the linux paffainity component) if you configure OMPI --with-valgrind. This will proagmatically tell valgrind that the memory access that PLPA is doing "is ok" (i.e., it's specifically intended to be an error for long/complicated reasons).

But I'm able to replicate your error (but shouldn't the 2nd buffer be the 1st + size (not 2)?) -- let me dig into it a bit... we definitely shouldn't be getting invalid writes in the convertor, etc.

I've filed ticket #1903 about this issue:

    https://svn.open-mpi.org/trac/ompi/ticket/1903

--
Jeff Squyres
Cisco Systems


Reply via email to