Roy, First and foremost the two datatype markers (MPI_LB and MPI_UB) have been deprecated from MPI 3.0 for exactly the reason you encountered. Once a datatype is annotated with these markers, they are propagated to all derived types, leading to an unnatural datatype definition. This behavior is enforced by the definition of the typemap specified by the equation on Section 4.1 page 105 line 18. Unfortunately, the only way to circumvent this issue, is to manually set the UB to all newly created datatypes.
Thus, to fix your datatype composition you just have to add an explicit MPI_LB (set to 0) when calling the MPI_Type_struct on your second struct datatype. George. On Tue, Aug 25, 2015 at 10:57 PM, Roy Stogner <royst...@ices.utexas.edu> wrote: > > This may be a general MPI question rather than an OpenMPI-specific > question. I apologize if I've picked the wrong mailing list for such, > and in that case I'd welcome being redirected to a more appropriate > forum. > > I'm trying to debug a problem in a much more complex system, but > I've boiled it down to a ~100 line MPI-1 test case which exhibits > similarly unexpected behavior. The test case first uses > MPI_Type_struct to create a data type corresponding to a 2D point (two > doubles), then again to create a heterogenous pair of a single double > preceding the 2D point data type. > > If I use a single block to create the inner data type, then the result > works as expected for all operations I've tested. > > If I use MPI_LB to indicate a lower bound of 0 on the inner data type > (which I believe should be redundant in this case, but which can be > necessary for more intricate data types in the complex system), then > the result fails. > > Specifically, the recursive data type then gives corrupt results when > communicating vectors of these pairs, and even without communication > we can see unexpected behavior by querying its extents: A triplet of > doubles should begin at byte 0 and end at byte 24, but querying > MPI_Type_lb gives a beginning at byte 8 if MPI_LB has been used in the > construction. > > Running mpicxx on the attached file (equivalently, the code pasted at > https://github.com/libMesh/libmesh/issues/631#issuecomment-134800297 > in case file attachments get stripped here) demonstrates the problem > on my (64-bit) system. For simplicity the displacements here are > hard-coded rather than portable, but the original MPI_Address and > MPI_Get_address based routines failed in the same way. > > Our full system only fails with MPICH2, but that may just have been > luck? With this test case I'm seeing failures with both MPICH2 and > OpenMPI and so I've got to assume my own code is at fault. Any help > would be appreciated. If there's anything I can do to make the issue > easier to replicate please let me know. > > Thanks, > --- > Roy Stogner > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/08/27495.php >