Dear All,
I would appreciate some general advice on how to efficiently implement
the following scenario.
I am looking into how to send a large amount of data over IB _once_, to
multiple receivers. The trick is, of course, that while the ping-pong
benchmark delivers great bandwidth, it does so by re-using the already
registered memory buffers. Since I need to send the data once, the
memory registration penalty is not easily avoided. I've been looking
into the following approaches:
1. have multiple ranks send different parts of the data to different
receivers, in the hope that the memory registration cost will be hidden
2. pre-register two smaller buffers, into which a data is copied before
sending
The first approach is the best I've managed so far, but the bandwidth
reached is still lower than what I observe using the pingpong benchmark.
Also, the performance depends on the number of sending ranks and drops
if there are too many.
In the second approach one pays for a data copy. My thinking was that
since the effective memory bandwidth available on a single modern CPU is
larger than the IB bandwidth, I could squeeze out some performance by
combining double buffering and multithreading, e.g.,
Step 1. thread A sends the data in the current buffer. Behind the
scenes, thread B copies data from memory to the next buffer
Step 2. buffers are switched
A similar idea would be to use MPI_Get on the remote rank. The sender
would copy the data from the memory to the second buffer while the RMA
window with the first buffer is exposed. In theory, I would expect those
two operations to be executed simultaneously, with the memory copy
hopefully hidden behind the IB transfer.
Of course, the experiments didn't really work. While the first
(multi-rank) approach is OK and shows some improvement, the bandwidth
could still be improved. None of my double-buffering approaches worked
at all, possibly because memory bandwidth contention.
So I was wondering, has any of you had any experience with similar
approaches? In your experience, what would be the best approach?
Thanks a lot!
Marcin
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users