Re: [O-MPI users] Collective communications with derived datatypes

George Bosilca Mon, 15 Aug 2005 15:48:26 -0500

Joel,

I took a look at your code and found the error. Basically, it's just a datatype problem. The datatype as described in your program does not correspond to the one you expect to see in practice. Actually you forget to set the correct extent.

Let me show you the problem. Let's suppose 2 processes and the default values from you program. The original matrix (at the root) is:

root    0.000000        1.000000        2.000000
root    3.000000        4.000000        5.000000
root    6.000000        7.000000        8.000000
root    9.000000        10.000000       11.000000
root    12.000000       13.000000       14.000000
root    15.000000       16.000000       17.000000
root    18.000000       19.000000       20.000000
root    21.000000       22.000000       23.000000
root    24.000000       25.000000       26.000000
root    27.000000       28.000000       29.000000

And your datatype is vector( 5, 1, 3, MPI_DOUBLE). If you look at the definition of the vector type as defined by the MPI standard you will notice that the datatype will end at the end of the last element in the vector, and will not add any gap at the end. Thus the extent of your datatype is 13 double [(5 - 1) * 3 + 1]. Here is the memory covered by one element:

root    0.000000        1.000000        2.000000
root    3.000000        4.000000        5.000000
root    6.000000        7.000000        8.000000
root    9.000000        10.000000       11.000000
root    12.000000

Then if you consider a memory layout containing 2 such datatypes (as the scatter does) the first element from the second one will be 13 not 15 as you expect.

Now if you need to have the second datatype starting with 15 you have to extent the last line to include all elements on the last line (13 and 14). You can use MPI_UB or MPI_Type_create_resized (depending if you want MPI 1 or MPI 2). Attached you will find a C program who does exactly what you expect. You can define MORE_OUTPUT to see how exactly your matrices get filled at each step.


  george.

mpi_test.c
Description: Binary data

PS: I was unable to compile any of the codes you attached to your email, so I write them starting from your code as well as your description. Hope they answer to your question.


On Aug 4, 2005, at 3:04 PM, Joel Eaves wrote:

Hi group. I posted a general MPI question a while ago to the mpi newsgroup but didn't get a response. I need to figure this out so I thought I would try it on you.


I have written a piece of code that fills a 2D array sequentially so
that I can keep track of which elements are being dropped in the

message passing. I use the type_vector datatype to generate a datatype

for passing the columns.  In C, I can see that the scatter operation
passes the first matrix to process 0 correctly but that the second
matrix to process 1 is screwed up because the elements are set
backwards by two.  In other words, the second matrix begins with the
lucky 13th element instead of the 15th like it should.  There is
overlap -- the same elements appear in both of the scattered matrices.
The C++ code goes over like a lead baloon.  The operation is clearly
asking for data outside of the range for the filled matrix and so the
values of the scattered matrix are all screwed up.  I am using the LAM
MPI v. 7.1.1 and Mac OS 10.3.8
with gcc v. 3.3.  I got similar results using MPICH-2 on Linux.
Here's a piece of code written in C.

#include <mpi.h>
#include <iostream>

int main(int argc,char* argv[]){
MPI_Init(&argc,&argv);
int my_rank = MPI::COMM_WORLD.Get_rank(),n_global = 10,n_procs =
MPI::COMM_WORLD.Get_size(),
d=3,n_local = n_global/n_procs,i,k,root=0;
double A_global[n_global][d],A_local[n_local][d];
MPI_Datatype scatter;
MPI_Type_vector(n_local,1,d,MPI_DOUBLE,&scatter);
MPI_Type_commit(&scatter);
if(my_rank==root){
for(i=0;i<n_global;i++)
for(k=0;k<d;k++)
A_global[i][k] = i*d+k;
for(k=0;k<d;k++)

MPI_Scatter(&(A_global[0][k]),1,scatter,&(A_local[0][k]), 1,scatter,root,MPI_COMM_WORLD);

for(i=0;i<n_local;i++){
for(k=0;k<d;k++)
cout << A_local[i][k] << "\t";
cout << endl;

}

MPI_Finalize();
return 0;
}

In C++, the code is
#include <mpi.h>
#include <iostream>
int main(int argc,char* argv[]){
MPI::Init();
int my_rank = MPI::COMM_WORLD.Get_rank(),n_global = 10,n_procs =
MPI::COMM_WORLD.Get_size(),
d=3,n_local = n_global/n_procs,i,k,root=0;
double A_global[n_global][d],A_local[n_local][d];
MPI::Datatype scatter(MPI::DOUBLE);
scatter.Create_vector(n_local,1,d);
scatter.Commit();
if(my_rank==root){
for(i=0;i<n_global;i++)
for(k=0;k<d;k++)
A_global[i][k] = i*d+k;
for(k=0;k<d;k++)

MPI::COMM_WORLD.Scatter(&(A_global[0][k]),1,scatter,&(A_local[0] [k]),1,scatter,root);

for(i=0;i<n_local;i++){
for(k=0;k<d;k++)
cout << A_local[i][k] << "\t";
cout << endl;

}

MPI::Finalize();
return 0;
}

I'm running the process (after a lamboot) with the command
mpirun -np 2 scatter.out

and compiling with the command

mpic++ Scatter.cpp -o scatter.out

Can anyone help out with this?  I don't
understand why the commands for C++ are returning erroneous results
that are *different* than they are from the C program.

Thanks,

Joel
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [O-MPI users] Collective communications with derived datatypes

Reply via email to