Hello, I seem to have encountered a bug in Open MPI 1.0 using indexed datatypes with MPI_Recv (which seems to be of the "off by one" sort). I have joined a test case, which is briefly explained below (as well as in the source file). This case should run on two processes. I observed the bug on 2 different Linux systems (single processor Centrino under Suse 10.0 with gcc 4.0.2, dual-processor Xeon under Debian Sarge with gcc 3.4) with Open MPI 1.0.1, and do not observe it using LAM 7.1.1 or MPICH2.
Here is a summary of the case: ------------------ Each processor reads a file ("data_p0" or "data_p1") giving a list of global element ids. Some elements (vertices from a partitionned mesh) may belong to both processors, so their id's may appear on both processors: we have 7178 global vertices, 3654 and 3688 of them being known by ranks 0 and 1 respectively. In this simplified version, we assign coordinates {x, y, z} to each vertex equal to it's global id number for rank 1, and the negative of that for rank 0 (assigning the same values to x, y, and z). After finishing the "ordered gather", rank 0 prints the global id and coordinates of each vertex. lines should print (for example) as: 6456 ; 6455.00000 6455.00000 6456.00000 6457 ; -6457.00000 -6457.00000 -6457.00000 depending on whether a vertex belongs only to rank 0 (negative coordinates) or belongs to rank 1 (positive coordinates). With the OMPI 1.0.1 bug (observed on Suse Linux 10.0 with gcc 4.0 and on Debian sarge with gcc 3.4), we have for example for the last vertices: 7176 ; 7175.00000 7175.00000 7176.00000 7177 ; 7176.00000 7176.00000 7177.00000 seeming to indicate an "off by one" type bug in datatype handling Not using an indexed datatype (i.e. not defining USE_INDEXED_DATATYPE in the gather_test.c file), the bug dissapears. Using the indexed datatype with LAM MPI 7.1.1 or MPICH2, we do not reproduce the bug either, so it does seem to be an Open MPI issue. ------------------ Best regards, Yvan Fournier
ompi_datatype_bug.tar.gz
Description: application/compressed-tar