Yes. This is absolutely normal. When you give MPI non-contiguous data it has to break out down into one operation per contiguous region. If you have a non-RDMA network Ross can lead to very poor performance. With RDMA networks it will also be much slower than a contiguous get but lower overhead per network operation.
-Nathan > On Mar 30, 2023, at 10:43 AM, Antoine Motte via users > <users@lists.open-mpi.org> wrote: > > > Hello everyone, > > I recently had to code an MPI application where I send std::vector contents > in a distributed environment. In order to try different approaches I coded > both 1-sided and 2-sided point-to-point communication schemes, the first one > uses MPI_Window and MPI_Get, the second one uses MPI_SendRecv. > > I had a hard time figuring out why my implementation with MPI_Get was between > 10 and 100 times slower, and I finally found out that MPI_Get is abnormally > slow when one tries to send custom datatypes including padding. > > Here is a short example attached, where I send a struct {double, int} (12 > bytes of data + 4 bytes of padding) vs a struct {double, int, int} (16 bytes > of data, 0 bytes of padding) with both MPI_SendRecv and MPI_Get. I got these > results : > > mpirun -np 4 ./compareGetWithSendRecv > {double, int} SendRecv : 0.0303547 s > {double, int} Get : 1.9196 s > {double, int, int} SendRecv : 0.0164659 s > {double, int, int} Get : 0.0147757 s > > I run it with both Open MPI 4.1.2 and with intel MPI 2021.6 and got the same > results. > > Is this result normal? Do I have any solution other than adding garbage at > the end of the struct or at the end of the MPI_Datatype to avoid padding? > > Regards, > > Antoine Motte > > <compareGetWithSendRecv.cpp>