not sure what you mean. The bug reports were not correlated, and the previous one was on the 1.6 series which I understand had a fairly outdated version of ROMIO.
We had last year a bug report from a user with a problem with 0 byte file view, and added special code for handling that in OMPIO. I would have to dig up the details, but if I remember correctly OMPIO doesn't even go into the derived datatype code if the size of the fileview is zero, and I would suspect that's why we pass the test. I was curious whether that fix would work for this scenario as well. Edgar On 5/19/2014 11:23 AM, Rob Latham wrote: > > > On 05/15/2014 08:32 AM, Edgar Gabriel wrote: >> could you try just for curiosity to force to use OMPIO? e.g. >> mpirun --mca io ompio .... > > Edgar, what is in the air that there are now three bug reports against > ROMIO's flattening code in the last month? > > We've fixed this upstream in ROMIO by ignoring zero-length blocks, but > George Bosilca suggested Open-MPI's fix for that might have been too > aggressive. > > For those of you not on the mpich-discuss list, we've determined that > whatever problem MPICH had with Oriol Canela-Xandri's test case has been > fixed in the latest from-git versions. > > OMPIO uses OpenMPI's datatype processing, so if they both handle > zero-length blocks the same way, everything's fine. ROMIO processes > datatypes internally (providing a third implementation of MPI datatype > processing. sigh.). If there's a disagreement about how to handle > these special cases, memory errors such as you report can happen. > > ==rob > >> >> Thanks >> Edgar >> >> On 5/15/2014 3:56 AM, CANELA-XANDRI Oriol wrote: >>> Hi, I installed and tried with version 1.8.1 but it also fails. I see >>> the error when there are some processes without any matrix block. >>> It's not a common situation, but this makes me feel unsure about I am >>> not doing something wrong. The error I get is: >>> >>> *** Error in `./binary': free(): invalid size: 0x0000000000a34c00 *** >>> [oriol-VirtualBox:13975] *** Process received signal *** >>> [oriol-VirtualBox:13975] Signal: Aborted (6) >>> [oriol-VirtualBox:13975] Signal code: (-6) >>> [oriol-VirtualBox:13969] *** Process received signal *** >>> [oriol-VirtualBox:13969] Signal: Aborted (6) >>> [oriol-VirtualBox:13969] Signal code: (-6) >>> ======= Backtrace: ========= >>> /lib/x86_64-linux-gnu/libc.so.6(+0x80996)[0x7f5844a8d996] >>> [oriol-VirtualBox:13969] [ 0] >>> /lib/x86_64-linux-gnu/libc.so.6(+0x36ff0)[0x7f06a50a7ff0] >>> [oriol-VirtualBox:13969] [ 1] >>> /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f06a50a7f77] >>> [oriol-VirtualBox:13969] [ 2] >>> /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f06a50ab5e8] >>> [oriol-VirtualBox:13969] [ 3] >>> /lib/x86_64-linux-gnu/libc.so.6(+0x744fb)[0x7f06a50e54fb] >>> [oriol-VirtualBox:13969] [ 4] >>> /lib/x86_64-linux-gnu/libc.so.6(+0x80996)[0x7f06a50f1996] >>> [oriol-VirtualBox:13969] [ 5] >>> /usr/local/lib/openmpi/mca_io_romio.so(ADIOI_Delete_flattened+0x62)[0x7f0691e12c02] >>> >>> [oriol-VirtualBox:13969] [ 6] >>> /usr/local/lib/openmpi/mca_io_romio.so(ADIO_Close+0x1f9)[0x7f0691df7189] >>> [oriol-VirtualBox:13969] [ 7] >>> /usr/local/lib/openmpi/mca_io_romio.so(mca_io_romio_dist_MPI_File_close+0xe8)[0x7f0691de9dd8] >>> >>> [oriol-VirtualBox:13969] [ 8] >>> /usr/local/lib/libmpi.so.1(+0x3a2c6)[0x7f06a5ea02c6] >>> [oriol-VirtualBox:13969] [ 9] >>> /usr/local/lib/libmpi.so.1(ompi_file_close+0x41)[0x7f06a5ea0811] >>> [oriol-VirtualBox:13969] [10] >>> /usr/local/lib/libmpi.so.1(PMPI_File_close+0x78)[0x7f06a5edc118] >>> [oriol-VirtualBox:13969] [11] ./binary[0x42099e] >>> [oriol-VirtualBox:13969] [12] ./binary[0x48ed86] >>> [oriol-VirtualBox:13969] [13] ./binary[0x40e49f] >>> [oriol-VirtualBox:13969] [14] >>> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f06a5092de5] >>> [oriol-VirtualBox:13969] [15] ./binary[0x40d679] >>> [oriol-VirtualBox:13969] *** End of error message *** >>> [oriol-VirtualBox:13975] [ 0] >>> /lib/x86_64-linux-gnu/libc.so.6(+0x36ff0)[0x7f1857201ff0] >>> [oriol-VirtualBox:13975] [ 1] >>> /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f1857201f77] >>> [oriol-VirtualBox:13975] [ 2] >>> /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f18572055e8] >>> [oriol-VirtualBox:13975] [ 3] >>> /lib/x86_64-linux-gnu/libc.so.6(+0x744fb)[0x7f185723f4fb] >>> [oriol-VirtualBox:13975] [ 4] >>> /lib/x86_64-linux-gnu/libc.so.6(+0x80996)[0x7f185724b996] >>> [oriol-VirtualBox:13975] [ 5] >>> /usr/local/lib/openmpi/mca_io_romio.so(ADIOI_Delete_flattened+0x62)[0x7f18459d2c02] >>> >>> [oriol-VirtualBox:13975] [ 6] >>> /usr/local/lib/openmpi/mca_io_romio.so(ADIO_Close+0x1f9)[0x7f18459b7189] >>> [oriol-VirtualBox:13975] [ 7] >>> /usr/local/lib/openmpi/mca_io_romio.so(mca_io_romio_dist_MPI_File_close+0xe8)[0x7f18459a9dd8] >>> >>> [oriol-VirtualBox:13975] [ 8] >>> /usr/local/lib/libmpi.so.1(+0x3a2c6)[0x7f1857ffa2c6] >>> [oriol-VirtualBox:13975] [ 9] >>> /usr/local/lib/libmpi.so.1(ompi_file_close+0x41)[0x7f1857ffa811] >>> [oriol-VirtualBox:13975] [10] >>> /usr/local/lib/libmpi.so.1(PMPI_File_close+0x78)[0x7f1858036118] >>> [oriol-VirtualBox:13975] [11] ./binary[0x42099e] >>> [oriol-VirtualBox:13975] [12] ./binary[0x48ed86] >>> [oriol-VirtualBox:13975] [13] ./binary[0x40e49f] >>> [oriol-VirtualBox:13975] [14] >>> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f18571ecde5] >>> [oriol-VirtualBox:13975] [15] ./binary[0x40d679] >>> [oriol-VirtualBox:13975] *** End of error message *** >>> [oriol-VirtualBox:13972] *** Process received signal *** >>> [oriol-VirtualBox:13972] Signal: Aborted (6) >>> [oriol-VirtualBox:13972] Signal code: (-6) >>> [oriol-VirtualBox:13972] [ 0] >>> /lib/x86_64-linux-gnu/libc.so.6(+0x36ff0)[0x7f5844a43ff0] >>> [oriol-VirtualBox:13972] [ 1] >>> /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f5844a43f77] >>> [oriol-VirtualBox:13972] [ 2] >>> /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f5844a475e8] >>> [oriol-VirtualBox:13972] [ 3] >>> /lib/x86_64-linux-gnu/libc.so.6(+0x744fb)[0x7f5844a814fb] >>> [oriol-VirtualBox:13972] [ 4] >>> /lib/x86_64-linux-gnu/libc.so.6(+0x80996)[0x7f5844a8d996] >>> [oriol-VirtualBox:13972] [ 5] >>> /usr/local/lib/openmpi/mca_io_romio.so(ADIOI_Delete_flattened+0x62)[0x7f58315f2c02] >>> >>> [oriol-VirtualBox:13972] [ 6] >>> /usr/local/lib/openmpi/mca_io_romio.so(ADIO_Close+0x1f9)[0x7f58315d7189] >>> [oriol-VirtualBox:13972] [ 7] >>> /usr/local/lib/openmpi/mca_io_romio.so(mca_io_romio_dist_MPI_File_close+0xe8)[0x7f58315c9dd8] >>> >>> [oriol-VirtualBox:13972] [ 8] >>> /usr/local/lib/libmpi.so.1(+0x3a2c6)[0x7f584583c2c6] >>> [oriol-VirtualBox:13972] [ 9] >>> /usr/local/lib/libmpi.so.1(ompi_file_close+0x41)[0x7f584583c811] >>> [oriol-VirtualBox:13972] [10] >>> /usr/local/lib/libmpi.so.1(PMPI_File_close+0x78)[0x7f5845878118] >>> [oriol-VirtualBox:13972] [11] ./binary[0x42099e] >>> [oriol-VirtualBox:13972] [12] ./binary[0x48ed86] >>> [oriol-VirtualBox:13972] [13] ./binary[0x40e49f] >>> [oriol-VirtualBox:13972] [14] >>> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f5844a2ede5] >>> [oriol-VirtualBox:13972] [15] ./binary[0x40d679] >>> [oriol-VirtualBox:13972] *** End of error message *** >>> -------------------------------------------------------------------------- >>> >>> mpirun noticed that process rank 2 with PID 13969 on node >>> oriol-VirtualBox exited on signal 6 (Aborted). >>> -------------------------------------------------------------------------- >>> >>> >>> >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > -- Edgar Gabriel Associate Professor Parallel Software Technologies Lab http://pstl.cs.uh.edu Department of Computer Science University of Houston Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
signature.asc
Description: OpenPGP digital signature