This may be a general MPI question rather than an OpenMPI-specific
question. I apologize if I've picked the wrong mailing list for such,
and in that case I'd welcome being redirected to a more appropriate
forum.
I'm trying to debug a problem in a much more complex system, but
I've boiled it down to a ~100 line MPI-1 test case which exhibits
similarly unexpected behavior. The test case first uses
MPI_Type_struct to create a data type corresponding to a 2D point (two
doubles), then again to create a heterogenous pair of a single double
preceding the 2D point data type.
If I use a single block to create the inner data type, then the result
works as expected for all operations I've tested.
If I use MPI_LB to indicate a lower bound of 0 on the inner data type
(which I believe should be redundant in this case, but which can be
necessary for more intricate data types in the complex system), then
the result fails.
Specifically, the recursive data type then gives corrupt results when
communicating vectors of these pairs, and even without communication
we can see unexpected behavior by querying its extents: A triplet of
doubles should begin at byte 0 and end at byte 24, but querying
MPI_Type_lb gives a beginning at byte 8 if MPI_LB has been used in the
construction.
Running mpicxx on the attached file (equivalently, the code pasted at
https://github.com/libMesh/libmesh/issues/631#issuecomment-134800297
in case file attachments get stripped here) demonstrates the problem
on my (64-bit) system. For simplicity the displacements here are
hard-coded rather than portable, but the original MPI_Address and
MPI_Get_address based routines failed in the same way.
Our full system only fails with MPICH2, but that may just have been
luck? With this test case I'm seeing failures with both MPICH2 and
OpenMPI and so I've got to assume my own code is at fault. Any help
would be appreciated. If there's anything I can do to make the issue
easier to replicate please let me know.
Thanks,
---
Roy Stogner
#include <utility>
#include <vector>
#include "mpi.h"
#define TEST_UB 0
#define TEST_LB 1
struct MyPoint {
double val[2];
MyPoint(double x, double y) {
val[0] = x;
val[1] = y;
}
};
int main(int argc, char** argv)
{
MPI_Init(&argc, &argv);
int size, myrank;
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
std::vector<std::pair<double, MyPoint> > vals
(size, std::make_pair(-1.0, MyPoint(-2.0, -3.0)));
std::pair<double, MyPoint> inval =
std::make_pair(myrank+0.75, MyPoint(myrank, myrank+0.25));
MPI_Datatype pt_type, my_pair_type;
// WORKS:
#if !TEST_LB && !TEST_UB
int pt_blocklengths[2] = {1, 1};
MPI_Datatype pt_types[2] = {MPI_DOUBLE, MPI_DOUBLE};
MPI_Aint pt_displs[2] = {0, 8};
MPI_Type_struct (2, pt_blocklengths, pt_displs, pt_types, &pt_type);
#endif
// WORKS:
#if !TEST_LB && TEST_UB
int pt_blocklengths[] = {1, 1, 1};
MPI_Datatype pt_types[] = {MPI_DOUBLE, MPI_DOUBLE, MPI_UB};
MPI_Aint pt_displs[] = {0, 8, 16};
MPI_Type_struct (3, pt_blocklengths, pt_displs, pt_types, &pt_type);
#endif
// FAILS:
#if TEST_LB && !TEST_UB
int pt_blocklengths[] = {1, 1, 1};
MPI_Datatype pt_types[] = {MPI_LB, MPI_DOUBLE, MPI_DOUBLE};
MPI_Aint pt_displs[] = {0, 0, 8};
MPI_Type_struct (3, pt_blocklengths, pt_displs, pt_types, &pt_type);
#endif
// FAILS:
#if TEST_LB && TEST_UB
int pt_blocklengths[4] = {1, 1, 1, 1};
MPI_Datatype pt_types[4] = {MPI_LB, MPI_DOUBLE, MPI_DOUBLE, MPI_UB};
MPI_Aint pt_displs[4] = {0, 0, 8, 16};
MPI_Type_struct (4, pt_blocklengths, pt_displs, pt_types, &pt_type);
#endif
MPI_Type_commit (&pt_type);
MPI_Aint pt_bound;
MPI_Type_lb(pt_type, &pt_bound);
// PRINTS 0:
std::cout << "Point LB = " << pt_bound << std::endl;
MPI_Type_ub(pt_type, &pt_bound);
// PRINTS 16:
std::cout << "Point UB = " << pt_bound << std::endl;
MPI_Datatype types[] = { MPI_DOUBLE, pt_type };
int blocklengths[] = {1,1};
MPI_Aint displs[] = {0,8};
MPI_Type_struct (2, blocklengths, displs, types, &my_pair_type);
MPI_Type_commit (&my_pair_type);
MPI_Aint paired_bound;
MPI_Type_lb(my_pair_type, &paired_bound);
// FAIL CASE PRINTS 8:
// SUCCESS CASE PRINTS 0:
std::cout << "Paired LB = " << paired_bound << std::endl;
MPI_Type_ub(my_pair_type, &paired_bound);
// FAIL CASE PRINTS 24:
// SUCCESS CASE PRINTS 24:
std::cout << "Paired UB = " << paired_bound << std::endl;
}