Jan,I'm using the latest of Open MPI compiled with debug turned on, and valgrind 3.3.0. From your trace it looks like there is a conflict between two memory managers. I'm not having the same problem as I disable the Open MPI memory manager on my builds (configure option -- without-memory-manager).
george. On Aug 6, 2008, at 9:29 AM, Jan Ploski wrote:
users-boun...@open-mpi.org schrieb am 08/05/2008 05:51:51 PM:Jan,I'm using valgrind with Open MPI on a [very] regular basis and I never had any problems. I usually want to know the execution path on the MPIapplications. For this I use:mpirun -np XX valgrind --tool=callgrind -q --log-file=some_file ./ my_appI just run your example: mpirun -np 2 -bynode --mca btl tcp,self valgrind --tool=massif - q ./NPmpi -u 4 and I got 2 non empty files in the current directory: bosilca@dancer:~/NetPIPE_3.6.2$ ls -l massif.out.* -rw------- 1 bosilca bosilca 140451 2008-08-05 11:57 massif.out. 21197 -rw------- 1 bosilca bosilca 131471 2008-08-05 11:57 massif.out. 21210George,Thanks for the info - which version of OpenMPI, compiler and valgrind didyou try with? I checked in two different clusters with OpenMPI 1.2.4 compiled with two different versions of the PGI compiler and valgrind 3.3.1, with the same bad result. I also noticed that the MPI processesdespite of producing the expected output do not terminate cleanly. I cansee in the stderr log (for each process): ==7909== Warning: client syscall munmap tried to modify addresses 0xD1968F92A19A72D1-0x34324E6F ==7909====7909== Process terminating with default action of signal 11 (SIGSEGV)==7909== Access not within mapped region at address 0x8053D8000 ==7909== at 0x5284996: _int_free (in /opt/openmpi-1.2.4/lib/libopen-pal.so.0.0.0) ==7909== by 0x52837A7: free (in /opt/openmpi-1.2.4/lib/libopen-pal.so.0.0.0) ==7909== by 0x593C76A: free_mem (in /lib64/libc-2.4.so) ==7909== by 0x593C3E1: __libc_freeres (in /lib64/libc-2.4.so) ==7909== by 0x491D31C: _vgnU_freeres (vg_preloaded.c:60) ==7909== by 0x587D1C4: exit (in /lib64/libc-2.4.so) ==7909== by 0x586815A: (below main) (in /lib64/libc-2.4.so)That probably explains why my massif.out.* are empty (<200 bytes long),but why do the processes crash? The same program runs ok with valgrind+MVAPICH or with OpenMPI without valgrind in their respective clusters. I experience this both with a simple test program and with a real application (WRF). Regards, Jan Ploski _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
smime.p7s
Description: S/MIME cryptographic signature