Galen Shipman wrote: >Hi Jean, > >You probably are not seeing overhead costs so much as you are seeing >the difference between using send/recv for small messages, which Open >MPI uses, and RDMA for small messages. If you are comparing against >another implementation that uses RDMA for small messages then yes, >you >will see lower latencies, but there are issues with using small >message >RDMA. I have written a paper that addresses these issues which will >be >presented at IPDPS.
I've been working for the MVAPICH project for around three years. Since this thread is discussing MVAPICH, I thought I should post to this thread. Galen's description of MVAPICH is not accurate. MVAPICH uses RDMA for short message to deliver performance benefits to the applications. However, it needs to be designed properly to handle scalability while delivering best performance. Since MVAPICH-0.9.6 (released on 6th December, 2005), MVAPICH has been supporting a new mode of operation which is called ADAPTIVE_RDMA_FAST_PATH (the basic RDMA_FAST_PATH is also supported). This new design uses RDMA for short message transfer in an intelligent and adaptive manner. Using this mode, the memory allocation of MVAPICH is no longer static. Instead its dynamic. Its an implementation of the short message RDMA implementation for a limited set of peers (user controllable) which Galen is suggesting. MVAPICH already supports this feature. This also means that in the paper Galen mentions, the comparison results in Figures 4 through 7 have to be re-evaluated to make the paper and the results accurate. Hope this helps. Thanks, Sayantan. -- http://www.cse.ohio-state.edu/~surs