On Tue, 2008-07-08 at 18:01 -0700, Tom Riddle wrote: > Thanks Ashley, after going through your suggestions we tried our test > with valgrind 3.3.0 and with glibc-devel-2.5-18.el5_1.1, both exhibit > the same results. A simple non-MPI test prog however returns expected > responses, so valgrind itself look ok. We then checked that the same > (shared) libc gets linked in both the MPI and non-MPI cases, adding > -pthread to the cc command line yields the same result, the only > difference it appears is the open mpi libraries. > > Now mpicc links against libopen-pal which defines malloc for it's own > purposes. The big difference seems to be that libopen-pal.so is > providing its own malloc replacement
This will be the problem, I've tested on a openmpi (1.2.6) machine here and I see exactly the same behaviour as you. I re-compiled the application without libopen-pal and valgrind works as expected. To do this I used mpicc -show to see what compile line it was using and ran the command myself without the -lopen-pal option. This clearly isn't a acceptable long-term solution but might help you in the short term. I'm a MPI expert but work on a different MPI to openmpi normally, I have however done a lot of work with Valgrind on different platforms so pick up questions related to it here. I think this problem is going to need input from one of the openmpi guys... The problem seems to be the presence of malloc() and free() functions in the libopen-pal library is preventing valgrind from intercepting these functions in glibc and hence dramatically reducing the benefit which valgrind brings. Ashley Pittman.