Hello, I had encountered a bug in Open MPI 1.0.1 using indexed datatypes with MPI_Recv (which seems to be of the "off by one" sort), which was corrected in Open MPI 1.0.2.
It seems to have resurfaced in Open MPI 1.1 (I encountered it using different data and did not recognize it immediately, but it seems it can reproduced using the same simplified test I had sent the first time, which I re-attach here just in case). Here is a summary of the case: ------------------ Each processor reads a file ("data_p0" or "data_p1") giving a list of global element ids. Some elements (vertices from a partitionned mesh) may belong to both processors, so their id's may appear on both processors: we have 7178 global vertices, 3654 and 3688 of them being known by ranks 0 and 1 respectively. In this simplified version, we assign coordinates {x, y, z} to each vertex equal to it's global id number for rank 1, and the negative of that for rank 0 (assigning the same values to x, y, and z). After finishing the "ordered gather", rank 0 prints the global id and coordinates of each vertex. lines should print (for example) as: 6456 ; 6455.00000 6455.00000 6456.00000 6457 ; -6457.00000 -6457.00000 -6457.00000 depending on whether a vertex belongs only to rank 0 (negative coordinates) or belongs to rank 1 (positive coordinates). With the OMPI 1.0.1 bug (observed on Suse Linux 10.0 with gcc 4.0 and on Debian sarge with gcc 3.4), we have for example for the last vertices: 7176 ; 7175.00000 7175.00000 7176.00000 7177 ; 7176.00000 7176.00000 7177.00000 seeming to indicate an "off by one" type bug in datatype handling Not using an indexed datatype (i.e. not defining USE_INDEXED_DATATYPE in the gather_test.c file), the bug dissapears. ------------------ Best regards, Yvan Fournier
ompi_datatype_bug.tar.gz
Description: application/compressed-tar