[AMD Official Use Only - Internal Distribution Only] Dear Jeff, Thank you for the reply.
I am doing pairwise send-recv and not all-to-all since not all the data is required by all the ranks. And I am doing blocking send and recv calls since there are multiple iterations of such message chunks to be sent with synchronization. I understand your recommendation in the below mail, however I still see benefit for my application level algorithm to do pairwise send-recv chunks where each chunk is within eager limit. Since the input and output buffer is same within the process, so I can avoid certain buffering at each sender rank by doing successive send calls within eager limit to receiver ranks and then have recv calls. Please do suggest me if you still feel I am going in wrong direction. With Regards, S. Biplab Raut -----Original Message----- From: Jeff Squyres (jsquyres) <jsquy...@cisco.com> Sent: Wednesday, March 25, 2020 10:22 PM To: Open MPI User's List <users@lists.open-mpi.org> Cc: George Bosilca <bosi...@icl.utk.edu>; Raut, S Biplab <biplab.r...@amd.com> Subject: Re: [OMPI users] Regarding eager limit relationship to send message size [CAUTION: External Email] On Mar 25, 2020, at 4:49 AM, Raut, S Biplab via users <users@lists.open-mpi.org> wrote: > > Let’s say the application is running with 128 ranks. > Each rank is doing send() msg to rest of 127 ranks where the msg length sent > is under question. > Now after all the sends are completed, each rank will recv() msg from rest of > 127 ranks. > Unless the msg length in the sending part is within eager_limit (4K size), > this program will hang. > So, based on the above scenario, my questions are:- > • Can each of the rank send message upto 4K size successfully, i.e all > 128 ranks sending (128 * 4K) bytes simultaneously? > • If application has bigger msg to be sent by each rank, then how to > derive the send message size? Is it equal to eager_limit and each rank needs > to send multiple chunks of this size? I think you're asking the wrong questions. MPI, as a standard, intentionally hides all the transport details from you. You have some additional controls from implementations such as Open MPI (e.g., MCA params that control the eager limit sizes), but that's not really the point -- those are really intended for optimizations on specific hardware. From an application perspective, you should *not* be writing your algorithms tuned to the eager limit size of a given MPI implementation. If you have blocking issues (potentially due to eager limit issues), then you should be using non-blocking MPI communication. Or, in your case, perhaps you should be using MPI collective operations (since everyone is sending to everyone). -- Jeff Squyres jsquy...@cisco.com