Hello,
I'm using a custom datatype created through MPI_Type_create_struct() to
send data with a dynamic structure to another process on the same node over
shared memory, and noticed it's much slower than expected.
I ran a profile, and it looks like it's not using CMA zero-copy, falling
back to us
zero copy does not work with non-contiguous datatypes (it would require
both processes to know the memory layout used by the peer). As long as the
memory layout described by the type can be seen as contiguous (even if
described otherwise), it should work just fine.
George.
On Tue, Apr 23, 2024