not sure what you mean. The bug reports were not correlated, and the
previous one was on the 1.6 series which I understand had a fairly
outdated version of ROMIO.

We had last year a bug report from a user with a problem with 0 byte
file view, and added special code for handling that in OMPIO. I would
have to dig up the details, but if I remember correctly OMPIO doesn't
even go into the derived datatype code if the size of the fileview is
zero, and I would suspect that's why we  pass the test. I was curious
whether that fix would work for this scenario as well.

Edgar

On 5/19/2014 11:23 AM, Rob Latham wrote:
> 
> 
> On 05/15/2014 08:32 AM, Edgar Gabriel wrote:
>> could you try just for curiosity to force to use OMPIO? e.g.
>> mpirun --mca io ompio ....
> 
> Edgar, what is in the air that there are now three bug reports against
> ROMIO's flattening code in the last month?
> 
> We've fixed this upstream in ROMIO by ignoring zero-length blocks, but
> George Bosilca suggested Open-MPI's fix for that might have been too
> aggressive.
> 
> For those of you not on the mpich-discuss list, we've determined that
> whatever problem MPICH had with Oriol Canela-Xandri's test case has been
> fixed in the latest from-git versions.
> 
> OMPIO uses OpenMPI's datatype processing, so if they both handle
> zero-length blocks the same way, everything's fine.  ROMIO processes
> datatypes internally (providing a third implementation of MPI datatype
> processing.  sigh.).  If there's a disagreement about how to handle
> these special cases, memory errors such as you report can happen.
> 
> ==rob
> 
>>
>> Thanks
>> Edgar
>>
>> On 5/15/2014 3:56 AM, CANELA-XANDRI Oriol wrote:
>>> Hi, I installed and tried with version 1.8.1 but it also fails. I see
>>> the error when there are some processes without any matrix block.
>>> It's not a common situation, but this makes me feel unsure about I am
>>> not doing something wrong.  The error I get is:
>>>
>>> *** Error in `./binary': free(): invalid size: 0x0000000000a34c00 ***
>>> [oriol-VirtualBox:13975] *** Process received signal ***
>>> [oriol-VirtualBox:13975] Signal: Aborted (6)
>>> [oriol-VirtualBox:13975] Signal code:  (-6)
>>> [oriol-VirtualBox:13969] *** Process received signal ***
>>> [oriol-VirtualBox:13969] Signal: Aborted (6)
>>> [oriol-VirtualBox:13969] Signal code:  (-6)
>>> ======= Backtrace: =========
>>> /lib/x86_64-linux-gnu/libc.so.6(+0x80996)[0x7f5844a8d996]
>>> [oriol-VirtualBox:13969] [ 0]
>>> /lib/x86_64-linux-gnu/libc.so.6(+0x36ff0)[0x7f06a50a7ff0]
>>> [oriol-VirtualBox:13969] [ 1]
>>> /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f06a50a7f77]
>>> [oriol-VirtualBox:13969] [ 2]
>>> /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f06a50ab5e8]
>>> [oriol-VirtualBox:13969] [ 3]
>>> /lib/x86_64-linux-gnu/libc.so.6(+0x744fb)[0x7f06a50e54fb]
>>> [oriol-VirtualBox:13969] [ 4]
>>> /lib/x86_64-linux-gnu/libc.so.6(+0x80996)[0x7f06a50f1996]
>>> [oriol-VirtualBox:13969] [ 5]
>>> /usr/local/lib/openmpi/mca_io_romio.so(ADIOI_Delete_flattened+0x62)[0x7f0691e12c02]
>>>
>>> [oriol-VirtualBox:13969] [ 6]
>>> /usr/local/lib/openmpi/mca_io_romio.so(ADIO_Close+0x1f9)[0x7f0691df7189]
>>> [oriol-VirtualBox:13969] [ 7]
>>> /usr/local/lib/openmpi/mca_io_romio.so(mca_io_romio_dist_MPI_File_close+0xe8)[0x7f0691de9dd8]
>>>
>>> [oriol-VirtualBox:13969] [ 8]
>>> /usr/local/lib/libmpi.so.1(+0x3a2c6)[0x7f06a5ea02c6]
>>> [oriol-VirtualBox:13969] [ 9]
>>> /usr/local/lib/libmpi.so.1(ompi_file_close+0x41)[0x7f06a5ea0811]
>>> [oriol-VirtualBox:13969] [10]
>>> /usr/local/lib/libmpi.so.1(PMPI_File_close+0x78)[0x7f06a5edc118]
>>> [oriol-VirtualBox:13969] [11] ./binary[0x42099e]
>>> [oriol-VirtualBox:13969] [12] ./binary[0x48ed86]
>>> [oriol-VirtualBox:13969] [13] ./binary[0x40e49f]
>>> [oriol-VirtualBox:13969] [14]
>>> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f06a5092de5]
>>> [oriol-VirtualBox:13969] [15] ./binary[0x40d679]
>>> [oriol-VirtualBox:13969] *** End of error message ***
>>> [oriol-VirtualBox:13975] [ 0]
>>> /lib/x86_64-linux-gnu/libc.so.6(+0x36ff0)[0x7f1857201ff0]
>>> [oriol-VirtualBox:13975] [ 1]
>>> /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f1857201f77]
>>> [oriol-VirtualBox:13975] [ 2]
>>> /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f18572055e8]
>>> [oriol-VirtualBox:13975] [ 3]
>>> /lib/x86_64-linux-gnu/libc.so.6(+0x744fb)[0x7f185723f4fb]
>>> [oriol-VirtualBox:13975] [ 4]
>>> /lib/x86_64-linux-gnu/libc.so.6(+0x80996)[0x7f185724b996]
>>> [oriol-VirtualBox:13975] [ 5]
>>> /usr/local/lib/openmpi/mca_io_romio.so(ADIOI_Delete_flattened+0x62)[0x7f18459d2c02]
>>>
>>> [oriol-VirtualBox:13975] [ 6]
>>> /usr/local/lib/openmpi/mca_io_romio.so(ADIO_Close+0x1f9)[0x7f18459b7189]
>>> [oriol-VirtualBox:13975] [ 7]
>>> /usr/local/lib/openmpi/mca_io_romio.so(mca_io_romio_dist_MPI_File_close+0xe8)[0x7f18459a9dd8]
>>>
>>> [oriol-VirtualBox:13975] [ 8]
>>> /usr/local/lib/libmpi.so.1(+0x3a2c6)[0x7f1857ffa2c6]
>>> [oriol-VirtualBox:13975] [ 9]
>>> /usr/local/lib/libmpi.so.1(ompi_file_close+0x41)[0x7f1857ffa811]
>>> [oriol-VirtualBox:13975] [10]
>>> /usr/local/lib/libmpi.so.1(PMPI_File_close+0x78)[0x7f1858036118]
>>> [oriol-VirtualBox:13975] [11] ./binary[0x42099e]
>>> [oriol-VirtualBox:13975] [12] ./binary[0x48ed86]
>>> [oriol-VirtualBox:13975] [13] ./binary[0x40e49f]
>>> [oriol-VirtualBox:13975] [14]
>>> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f18571ecde5]
>>> [oriol-VirtualBox:13975] [15] ./binary[0x40d679]
>>> [oriol-VirtualBox:13975] *** End of error message ***
>>> [oriol-VirtualBox:13972] *** Process received signal ***
>>> [oriol-VirtualBox:13972] Signal: Aborted (6)
>>> [oriol-VirtualBox:13972] Signal code:  (-6)
>>> [oriol-VirtualBox:13972] [ 0]
>>> /lib/x86_64-linux-gnu/libc.so.6(+0x36ff0)[0x7f5844a43ff0]
>>> [oriol-VirtualBox:13972] [ 1]
>>> /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f5844a43f77]
>>> [oriol-VirtualBox:13972] [ 2]
>>> /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f5844a475e8]
>>> [oriol-VirtualBox:13972] [ 3]
>>> /lib/x86_64-linux-gnu/libc.so.6(+0x744fb)[0x7f5844a814fb]
>>> [oriol-VirtualBox:13972] [ 4]
>>> /lib/x86_64-linux-gnu/libc.so.6(+0x80996)[0x7f5844a8d996]
>>> [oriol-VirtualBox:13972] [ 5]
>>> /usr/local/lib/openmpi/mca_io_romio.so(ADIOI_Delete_flattened+0x62)[0x7f58315f2c02]
>>>
>>> [oriol-VirtualBox:13972] [ 6]
>>> /usr/local/lib/openmpi/mca_io_romio.so(ADIO_Close+0x1f9)[0x7f58315d7189]
>>> [oriol-VirtualBox:13972] [ 7]
>>> /usr/local/lib/openmpi/mca_io_romio.so(mca_io_romio_dist_MPI_File_close+0xe8)[0x7f58315c9dd8]
>>>
>>> [oriol-VirtualBox:13972] [ 8]
>>> /usr/local/lib/libmpi.so.1(+0x3a2c6)[0x7f584583c2c6]
>>> [oriol-VirtualBox:13972] [ 9]
>>> /usr/local/lib/libmpi.so.1(ompi_file_close+0x41)[0x7f584583c811]
>>> [oriol-VirtualBox:13972] [10]
>>> /usr/local/lib/libmpi.so.1(PMPI_File_close+0x78)[0x7f5845878118]
>>> [oriol-VirtualBox:13972] [11] ./binary[0x42099e]
>>> [oriol-VirtualBox:13972] [12] ./binary[0x48ed86]
>>> [oriol-VirtualBox:13972] [13] ./binary[0x40e49f]
>>> [oriol-VirtualBox:13972] [14]
>>> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f5844a2ede5]
>>> [oriol-VirtualBox:13972] [15] ./binary[0x40d679]
>>> [oriol-VirtualBox:13972] *** End of error message ***
>>> --------------------------------------------------------------------------
>>>
>>> mpirun noticed that process rank 2 with PID 13969 on node
>>> oriol-VirtualBox exited on signal 6 (Aborted).
>>> --------------------------------------------------------------------------
>>>
>>>
>>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
> 

-- 
Edgar Gabriel
Associate Professor
Parallel Software Technologies Lab      http://pstl.cs.uh.edu
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to