On 05/15/2014 08:32 AM, Edgar Gabriel wrote:
could you try just for curiosity to force to use OMPIO? e.g.
mpirun --mca io ompio ....

Edgar, what is in the air that there are now three bug reports against ROMIO's flattening code in the last month?

We've fixed this upstream in ROMIO by ignoring zero-length blocks, but George Bosilca suggested Open-MPI's fix for that might have been too aggressive.

For those of you not on the mpich-discuss list, we've determined that whatever problem MPICH had with Oriol Canela-Xandri's test case has been fixed in the latest from-git versions.

OMPIO uses OpenMPI's datatype processing, so if they both handle zero-length blocks the same way, everything's fine. ROMIO processes datatypes internally (providing a third implementation of MPI datatype processing. sigh.). If there's a disagreement about how to handle these special cases, memory errors such as you report can happen.

==rob


Thanks
Edgar

On 5/15/2014 3:56 AM, CANELA-XANDRI Oriol wrote:
Hi, I installed and tried with version 1.8.1 but it also fails. I see the error 
when there are some processes without any matrix block. It's not a common 
situation, but this makes me feel unsure about I am not doing something wrong.  
The error I get is:

*** Error in `./binary': free(): invalid size: 0x0000000000a34c00 ***
[oriol-VirtualBox:13975] *** Process received signal ***
[oriol-VirtualBox:13975] Signal: Aborted (6)
[oriol-VirtualBox:13975] Signal code:  (-6)
[oriol-VirtualBox:13969] *** Process received signal ***
[oriol-VirtualBox:13969] Signal: Aborted (6)
[oriol-VirtualBox:13969] Signal code:  (-6)
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x80996)[0x7f5844a8d996]
[oriol-VirtualBox:13969] [ 0] 
/lib/x86_64-linux-gnu/libc.so.6(+0x36ff0)[0x7f06a50a7ff0]
[oriol-VirtualBox:13969] [ 1] 
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f06a50a7f77]
[oriol-VirtualBox:13969] [ 2] 
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f06a50ab5e8]
[oriol-VirtualBox:13969] [ 3] 
/lib/x86_64-linux-gnu/libc.so.6(+0x744fb)[0x7f06a50e54fb]
[oriol-VirtualBox:13969] [ 4] 
/lib/x86_64-linux-gnu/libc.so.6(+0x80996)[0x7f06a50f1996]
[oriol-VirtualBox:13969] [ 5] 
/usr/local/lib/openmpi/mca_io_romio.so(ADIOI_Delete_flattened+0x62)[0x7f0691e12c02]
[oriol-VirtualBox:13969] [ 6] 
/usr/local/lib/openmpi/mca_io_romio.so(ADIO_Close+0x1f9)[0x7f0691df7189]
[oriol-VirtualBox:13969] [ 7] 
/usr/local/lib/openmpi/mca_io_romio.so(mca_io_romio_dist_MPI_File_close+0xe8)[0x7f0691de9dd8]
[oriol-VirtualBox:13969] [ 8] 
/usr/local/lib/libmpi.so.1(+0x3a2c6)[0x7f06a5ea02c6]
[oriol-VirtualBox:13969] [ 9] 
/usr/local/lib/libmpi.so.1(ompi_file_close+0x41)[0x7f06a5ea0811]
[oriol-VirtualBox:13969] [10] 
/usr/local/lib/libmpi.so.1(PMPI_File_close+0x78)[0x7f06a5edc118]
[oriol-VirtualBox:13969] [11] ./binary[0x42099e]
[oriol-VirtualBox:13969] [12] ./binary[0x48ed86]
[oriol-VirtualBox:13969] [13] ./binary[0x40e49f]
[oriol-VirtualBox:13969] [14] 
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f06a5092de5]
[oriol-VirtualBox:13969] [15] ./binary[0x40d679]
[oriol-VirtualBox:13969] *** End of error message ***
[oriol-VirtualBox:13975] [ 0] 
/lib/x86_64-linux-gnu/libc.so.6(+0x36ff0)[0x7f1857201ff0]
[oriol-VirtualBox:13975] [ 1] 
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f1857201f77]
[oriol-VirtualBox:13975] [ 2] 
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f18572055e8]
[oriol-VirtualBox:13975] [ 3] 
/lib/x86_64-linux-gnu/libc.so.6(+0x744fb)[0x7f185723f4fb]
[oriol-VirtualBox:13975] [ 4] 
/lib/x86_64-linux-gnu/libc.so.6(+0x80996)[0x7f185724b996]
[oriol-VirtualBox:13975] [ 5] 
/usr/local/lib/openmpi/mca_io_romio.so(ADIOI_Delete_flattened+0x62)[0x7f18459d2c02]
[oriol-VirtualBox:13975] [ 6] 
/usr/local/lib/openmpi/mca_io_romio.so(ADIO_Close+0x1f9)[0x7f18459b7189]
[oriol-VirtualBox:13975] [ 7] 
/usr/local/lib/openmpi/mca_io_romio.so(mca_io_romio_dist_MPI_File_close+0xe8)[0x7f18459a9dd8]
[oriol-VirtualBox:13975] [ 8] 
/usr/local/lib/libmpi.so.1(+0x3a2c6)[0x7f1857ffa2c6]
[oriol-VirtualBox:13975] [ 9] 
/usr/local/lib/libmpi.so.1(ompi_file_close+0x41)[0x7f1857ffa811]
[oriol-VirtualBox:13975] [10] 
/usr/local/lib/libmpi.so.1(PMPI_File_close+0x78)[0x7f1858036118]
[oriol-VirtualBox:13975] [11] ./binary[0x42099e]
[oriol-VirtualBox:13975] [12] ./binary[0x48ed86]
[oriol-VirtualBox:13975] [13] ./binary[0x40e49f]
[oriol-VirtualBox:13975] [14] 
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f18571ecde5]
[oriol-VirtualBox:13975] [15] ./binary[0x40d679]
[oriol-VirtualBox:13975] *** End of error message ***
[oriol-VirtualBox:13972] *** Process received signal ***
[oriol-VirtualBox:13972] Signal: Aborted (6)
[oriol-VirtualBox:13972] Signal code:  (-6)
[oriol-VirtualBox:13972] [ 0] 
/lib/x86_64-linux-gnu/libc.so.6(+0x36ff0)[0x7f5844a43ff0]
[oriol-VirtualBox:13972] [ 1] 
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f5844a43f77]
[oriol-VirtualBox:13972] [ 2] 
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f5844a475e8]
[oriol-VirtualBox:13972] [ 3] 
/lib/x86_64-linux-gnu/libc.so.6(+0x744fb)[0x7f5844a814fb]
[oriol-VirtualBox:13972] [ 4] 
/lib/x86_64-linux-gnu/libc.so.6(+0x80996)[0x7f5844a8d996]
[oriol-VirtualBox:13972] [ 5] 
/usr/local/lib/openmpi/mca_io_romio.so(ADIOI_Delete_flattened+0x62)[0x7f58315f2c02]
[oriol-VirtualBox:13972] [ 6] 
/usr/local/lib/openmpi/mca_io_romio.so(ADIO_Close+0x1f9)[0x7f58315d7189]
[oriol-VirtualBox:13972] [ 7] 
/usr/local/lib/openmpi/mca_io_romio.so(mca_io_romio_dist_MPI_File_close+0xe8)[0x7f58315c9dd8]
[oriol-VirtualBox:13972] [ 8] 
/usr/local/lib/libmpi.so.1(+0x3a2c6)[0x7f584583c2c6]
[oriol-VirtualBox:13972] [ 9] 
/usr/local/lib/libmpi.so.1(ompi_file_close+0x41)[0x7f584583c811]
[oriol-VirtualBox:13972] [10] 
/usr/local/lib/libmpi.so.1(PMPI_File_close+0x78)[0x7f5845878118]
[oriol-VirtualBox:13972] [11] ./binary[0x42099e]
[oriol-VirtualBox:13972] [12] ./binary[0x48ed86]
[oriol-VirtualBox:13972] [13] ./binary[0x40e49f]
[oriol-VirtualBox:13972] [14] 
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f5844a2ede5]
[oriol-VirtualBox:13972] [15] ./binary[0x40d679]
[oriol-VirtualBox:13972] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 2 with PID 13969 on node oriol-VirtualBox 
exited on signal 6 (Aborted).
--------------------------------------------------------------------------





_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

Reply via email to