I read the MPICH trac ticket you pointed to and your analysis seems pertinent. 
The impact of my patch for “count = 0” has a similar outcome to yours: removed 
all references to the datatype if the count was zero, without looking fo the 
special markers.

Let me try to come up with a fix.

 Thanks,
   George.


On May 8, 2014, at 17:08 , Rob Latham <r...@mcs.anl.gov> wrote:

> 
> 
> On 05/07/2014 11:36 AM, Rob Latham wrote:
>> 
>> 
>> On 05/05/2014 09:20 PM, Richard Shaw wrote:
>>> Hello,
>>> 
>>> I think I've come across a bug when using ROMIO to read in a 2D
>>> distributed array. I've attached a test case to this email.
>> 
>> Thanks for the bug report and the test case.
>> 
>> I've opened MPICH bug (because this is ROMIO's fault, not OpenMPI's
>> fault... until I can prove otherwise ! :>)
> 
> This bug appears to be OpenMPIs fault now.
> 
> I'm looking at OpenMPI's "pulled it from git an hour ago" version, and 
> ROMIO's flattening code overruns arrays: the OpenMPI datatype processing 
> routines return too few blocks for ranks 1 and 3.
> 
> Michael Raymond told me off-list "I tracked this down to MPT not marking 
> HVECTORs / STRUCTs with 0-sized counts as contiguous. Once I changed this, 
> the memory corruption and the data mismatches both went away. ".  Could 
> something similar be happening in OpenMPI ?
> 
> ==rob
> 
>> 
>> http://trac.mpich.org/projects/mpich/ticket/2089
>> 
>> ==rob
>> 
>>> 
>>> In the testcase I first initialise an array of 25 doubles (which will be
>>> a 5x5 grid), then I create a datatype representing a 5x5 matrix
>>> distributed in 3x3 blocks over a 2x2 process grid. As a reference I use
>>> MPI_Pack to pull out the block cyclic array elements local to each
>>> process (which I think is correct). Then I write the original array of
>>> 25 doubles to disk, and use MPI-IO to read it back in (performing the
>>> Open, Set_view, and Real_all), and compare to the reference.
>>> 
>>> Running this with OMPI, the two match on all ranks.
>>> 
>>> > mpirun -mca io ompio -np 4 ./darr_read.x
>>> === Rank 0 === (9 elements)
>>> Packed:  0.0  1.0  2.0  5.0  6.0  7.0 10.0 11.0 12.0
>>> Read:    0.0  1.0  2.0  5.0  6.0  7.0 10.0 11.0 12.0
>>> 
>>> === Rank 1 === (6 elements)
>>> Packed: 15.0 16.0 17.0 20.0 21.0 22.0
>>> Read:   15.0 16.0 17.0 20.0 21.0 22.0
>>> 
>>> === Rank 2 === (6 elements)
>>> Packed:  3.0  4.0  8.0  9.0 13.0 14.0
>>> Read:    3.0  4.0  8.0  9.0 13.0 14.0
>>> 
>>> === Rank 3 === (4 elements)
>>> Packed: 18.0 19.0 23.0 24.0
>>> Read:   18.0 19.0 23.0 24.0
>>> 
>>> 
>>> 
>>> However, using ROMIO the two differ on two of the ranks:
>>> 
>>> > mpirun -mca io romio -np 4 ./darr_read.x
>>> === Rank 0 === (9 elements)
>>> Packed:  0.0  1.0  2.0  5.0  6.0  7.0 10.0 11.0 12.0
>>> Read:    0.0  1.0  2.0  5.0  6.0  7.0 10.0 11.0 12.0
>>> 
>>> === Rank 1 === (6 elements)
>>> Packed: 15.0 16.0 17.0 20.0 21.0 22.0
>>> Read:    0.0  1.0  2.0  0.0  1.0  2.0
>>> 
>>> === Rank 2 === (6 elements)
>>> Packed:  3.0  4.0  8.0  9.0 13.0 14.0
>>> Read:    3.0  4.0  8.0  9.0 13.0 14.0
>>> 
>>> === Rank 3 === (4 elements)
>>> Packed: 18.0 19.0 23.0 24.0
>>> Read:    0.0  1.0  0.0  1.0
>>> 
>>> 
>>> 
>>> My interpretation is that the behaviour with OMPIO is correct.
>>> Interestingly everything matches up using both ROMIO and OMPIO if I set
>>> the block shape to 2x2.
>>> 
>>> This was run on OS X using 1.8.2a1r31632. I have also run this on Linux
>>> with OpenMPI 1.7.4, and OMPIO is still correct, but using ROMIO I just
>>> get segfaults.
>>> 
>>> Thanks,
>>> Richard
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>> 
> 
> -- 
> Rob Latham
> Mathematics and Computer Science Division
> Argonne National Lab, IL USA
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to