On 05/15/2014 08:32 AM, Edgar Gabriel wrote:
could you try just for curiosity to force to use OMPIO? e.g.
mpirun --mca io ompio ....
Edgar, what is in the air that there are now three bug reports against
ROMIO's flattening code in the last month?
We've fixed this upstream in ROMIO by ignoring zero-length blocks, but
George Bosilca suggested Open-MPI's fix for that might have been too
aggressive.
For those of you not on the mpich-discuss list, we've determined that
whatever problem MPICH had with Oriol Canela-Xandri's test case has been
fixed in the latest from-git versions.
OMPIO uses OpenMPI's datatype processing, so if they both handle
zero-length blocks the same way, everything's fine. ROMIO processes
datatypes internally (providing a third implementation of MPI datatype
processing. sigh.). If there's a disagreement about how to handle
these special cases, memory errors such as you report can happen.
==rob
Thanks
Edgar
On 5/15/2014 3:56 AM, CANELA-XANDRI Oriol wrote:
Hi, I installed and tried with version 1.8.1 but it also fails. I see the error
when there are some processes without any matrix block. It's not a common
situation, but this makes me feel unsure about I am not doing something wrong.
The error I get is:
*** Error in `./binary': free(): invalid size: 0x0000000000a34c00 ***
[oriol-VirtualBox:13975] *** Process received signal ***
[oriol-VirtualBox:13975] Signal: Aborted (6)
[oriol-VirtualBox:13975] Signal code: (-6)
[oriol-VirtualBox:13969] *** Process received signal ***
[oriol-VirtualBox:13969] Signal: Aborted (6)
[oriol-VirtualBox:13969] Signal code: (-6)
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x80996)[0x7f5844a8d996]
[oriol-VirtualBox:13969] [ 0]
/lib/x86_64-linux-gnu/libc.so.6(+0x36ff0)[0x7f06a50a7ff0]
[oriol-VirtualBox:13969] [ 1]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f06a50a7f77]
[oriol-VirtualBox:13969] [ 2]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f06a50ab5e8]
[oriol-VirtualBox:13969] [ 3]
/lib/x86_64-linux-gnu/libc.so.6(+0x744fb)[0x7f06a50e54fb]
[oriol-VirtualBox:13969] [ 4]
/lib/x86_64-linux-gnu/libc.so.6(+0x80996)[0x7f06a50f1996]
[oriol-VirtualBox:13969] [ 5]
/usr/local/lib/openmpi/mca_io_romio.so(ADIOI_Delete_flattened+0x62)[0x7f0691e12c02]
[oriol-VirtualBox:13969] [ 6]
/usr/local/lib/openmpi/mca_io_romio.so(ADIO_Close+0x1f9)[0x7f0691df7189]
[oriol-VirtualBox:13969] [ 7]
/usr/local/lib/openmpi/mca_io_romio.so(mca_io_romio_dist_MPI_File_close+0xe8)[0x7f0691de9dd8]
[oriol-VirtualBox:13969] [ 8]
/usr/local/lib/libmpi.so.1(+0x3a2c6)[0x7f06a5ea02c6]
[oriol-VirtualBox:13969] [ 9]
/usr/local/lib/libmpi.so.1(ompi_file_close+0x41)[0x7f06a5ea0811]
[oriol-VirtualBox:13969] [10]
/usr/local/lib/libmpi.so.1(PMPI_File_close+0x78)[0x7f06a5edc118]
[oriol-VirtualBox:13969] [11] ./binary[0x42099e]
[oriol-VirtualBox:13969] [12] ./binary[0x48ed86]
[oriol-VirtualBox:13969] [13] ./binary[0x40e49f]
[oriol-VirtualBox:13969] [14]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f06a5092de5]
[oriol-VirtualBox:13969] [15] ./binary[0x40d679]
[oriol-VirtualBox:13969] *** End of error message ***
[oriol-VirtualBox:13975] [ 0]
/lib/x86_64-linux-gnu/libc.so.6(+0x36ff0)[0x7f1857201ff0]
[oriol-VirtualBox:13975] [ 1]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f1857201f77]
[oriol-VirtualBox:13975] [ 2]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f18572055e8]
[oriol-VirtualBox:13975] [ 3]
/lib/x86_64-linux-gnu/libc.so.6(+0x744fb)[0x7f185723f4fb]
[oriol-VirtualBox:13975] [ 4]
/lib/x86_64-linux-gnu/libc.so.6(+0x80996)[0x7f185724b996]
[oriol-VirtualBox:13975] [ 5]
/usr/local/lib/openmpi/mca_io_romio.so(ADIOI_Delete_flattened+0x62)[0x7f18459d2c02]
[oriol-VirtualBox:13975] [ 6]
/usr/local/lib/openmpi/mca_io_romio.so(ADIO_Close+0x1f9)[0x7f18459b7189]
[oriol-VirtualBox:13975] [ 7]
/usr/local/lib/openmpi/mca_io_romio.so(mca_io_romio_dist_MPI_File_close+0xe8)[0x7f18459a9dd8]
[oriol-VirtualBox:13975] [ 8]
/usr/local/lib/libmpi.so.1(+0x3a2c6)[0x7f1857ffa2c6]
[oriol-VirtualBox:13975] [ 9]
/usr/local/lib/libmpi.so.1(ompi_file_close+0x41)[0x7f1857ffa811]
[oriol-VirtualBox:13975] [10]
/usr/local/lib/libmpi.so.1(PMPI_File_close+0x78)[0x7f1858036118]
[oriol-VirtualBox:13975] [11] ./binary[0x42099e]
[oriol-VirtualBox:13975] [12] ./binary[0x48ed86]
[oriol-VirtualBox:13975] [13] ./binary[0x40e49f]
[oriol-VirtualBox:13975] [14]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f18571ecde5]
[oriol-VirtualBox:13975] [15] ./binary[0x40d679]
[oriol-VirtualBox:13975] *** End of error message ***
[oriol-VirtualBox:13972] *** Process received signal ***
[oriol-VirtualBox:13972] Signal: Aborted (6)
[oriol-VirtualBox:13972] Signal code: (-6)
[oriol-VirtualBox:13972] [ 0]
/lib/x86_64-linux-gnu/libc.so.6(+0x36ff0)[0x7f5844a43ff0]
[oriol-VirtualBox:13972] [ 1]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f5844a43f77]
[oriol-VirtualBox:13972] [ 2]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f5844a475e8]
[oriol-VirtualBox:13972] [ 3]
/lib/x86_64-linux-gnu/libc.so.6(+0x744fb)[0x7f5844a814fb]
[oriol-VirtualBox:13972] [ 4]
/lib/x86_64-linux-gnu/libc.so.6(+0x80996)[0x7f5844a8d996]
[oriol-VirtualBox:13972] [ 5]
/usr/local/lib/openmpi/mca_io_romio.so(ADIOI_Delete_flattened+0x62)[0x7f58315f2c02]
[oriol-VirtualBox:13972] [ 6]
/usr/local/lib/openmpi/mca_io_romio.so(ADIO_Close+0x1f9)[0x7f58315d7189]
[oriol-VirtualBox:13972] [ 7]
/usr/local/lib/openmpi/mca_io_romio.so(mca_io_romio_dist_MPI_File_close+0xe8)[0x7f58315c9dd8]
[oriol-VirtualBox:13972] [ 8]
/usr/local/lib/libmpi.so.1(+0x3a2c6)[0x7f584583c2c6]
[oriol-VirtualBox:13972] [ 9]
/usr/local/lib/libmpi.so.1(ompi_file_close+0x41)[0x7f584583c811]
[oriol-VirtualBox:13972] [10]
/usr/local/lib/libmpi.so.1(PMPI_File_close+0x78)[0x7f5845878118]
[oriol-VirtualBox:13972] [11] ./binary[0x42099e]
[oriol-VirtualBox:13972] [12] ./binary[0x48ed86]
[oriol-VirtualBox:13972] [13] ./binary[0x40e49f]
[oriol-VirtualBox:13972] [14]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f5844a2ede5]
[oriol-VirtualBox:13972] [15] ./binary[0x40d679]
[oriol-VirtualBox:13972] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 2 with PID 13969 on node oriol-VirtualBox
exited on signal 6 (Aborted).
--------------------------------------------------------------------------
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA