Dear All,

I would appreciate some general advice on how to efficiently implement the following scenario.

I am looking into how to send a large amount of data over IB _once_, to multiple receivers. The trick is, of course, that while the ping-pong benchmark delivers great bandwidth, it does so by re-using the already registered memory buffers. Since I need to send the data once, the memory registration penalty is not easily avoided. I've been looking into the following approaches:

1. have multiple ranks send different parts of the data to different receivers, in the hope that the memory registration cost will be hidden 2. pre-register two smaller buffers, into which a data is copied before sending

The first approach is the best I've managed so far, but the bandwidth reached is still lower than what I observe using the pingpong benchmark. Also, the performance depends on the number of sending ranks and drops if there are too many.

In the second approach one pays for a data copy. My thinking was that since the effective memory bandwidth available on a single modern CPU is larger than the IB bandwidth, I could squeeze out some performance by combining double buffering and multithreading, e.g.,

Step 1. thread A sends the data in the current buffer. Behind the scenes, thread B copies data from memory to the next buffer
Step 2. buffers are switched

A similar idea would be to use MPI_Get on the remote rank. The sender would copy the data from the memory to the second buffer while the RMA window with the first buffer is exposed. In theory, I would expect those two operations to be executed simultaneously, with the memory copy hopefully hidden behind the IB transfer.

Of course, the experiments didn't really work. While the first (multi-rank) approach is OK and shows some improvement, the bandwidth could still be improved. None of my double-buffering approaches worked at all, possibly because memory bandwidth contention.

So I was wondering, has any of you had any experience with similar approaches? In your experience, what would be the best approach?

Thanks a lot!

Marcin

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to