Dear George,
Thank you for your reply. I understood your point that I
should implement correct communication scheme instead of using eager limit as a
parameter.
However, my intention for relying and taking benefit of eager limit is to avoid
application memory allocation at the sending process.
I am doing pairwise communication(single in out buffer) and not all-to-all
comm. I want to take advantage of the message buffering done by eager protocol
only at the receiver process.
I want to save the application level buffering(required for the problem I am
trying to solve) at the sending process.
(Since as per Eager Protocol, it is the responsibility of the receiving process
to buffer the message upon its arrival, especially if the receive operation has
not been posted.)
Is there any ompi API to query eager limit? Or I have to check the MCA
variables and use it to pass to the application in case I want to use it.
With Regards,
S. Biplab Raut
From: George Bosilca <[email protected]>
Sent: Wednesday, March 25, 2020 9:58 PM
To: Raut, S Biplab <[email protected]>
Cc: Open MPI Users <[email protected]>
Subject: Re: [OMPI users] Regarding eager limit relationship to send message
size
[CAUTION: External Email]
On Wed, Mar 25, 2020 at 4:49 AM Raut, S Biplab
<[email protected]<mailto:[email protected]>> wrote:
[AMD Official Use Only - Internal Distribution Only]
Dear George,
Thank you the reply. But my question is more
particularly on the message size from application side.
Let’s say the application is running with 128 ranks.
Each rank is doing send() msg to rest of 127 ranks where the msg length sent is
under question.
Now after all the sends are completed, each rank will recv() msg from rest of
127 ranks.
Unless the msg length in the sending part is within eager_limit (4K size), this
program will hang.
This is definitively not true, one can imagine many communication patterns that
will ensure correctness for your all-to-all communications. As an example, you
can place your processes in a virtual ring, and at each step send and recv
to/from process (my_rank + step) % comm_size. This communication pattern will
always be correct, independent of the eager size (for as long as you correctly
order the send/recv for each pair).
So, based on the above scenario, my questions are:-
1. Can each of the rank send message upto 4K size successfully, i.e all 128
ranks sending (128 * 4K) bytes simultaneously?
Potentially yes, but there are physical constraints (aka number of network
links, switches capabilities, ... ) and memory limits. But if you have enough
memory, this could potentially work. I'm not saying this is correct and should
be done.
1. If application has bigger msg to be sent by each rank, then how to derive
the send message size? Is it equal to eager_limit and each rank needs to send
multiple chunks of this size?
Definitively not! You should never rely on the eager size to fix a complex
communication pattern. The rule of thumb should be: Is my application working
correctly if the MPI forces a zero-bytes eager size. As suggested above, the
most suitable approach is to define a communication scheme that would never
deadlock.
George.
With Regards,
S. Biplab Raut
From: George Bosilca <[email protected]<mailto:[email protected]>>
Sent: Tuesday, March 24, 2020 9:01 PM
To: Open MPI Users <[email protected]<mailto:[email protected]>>
Cc: Raut, S Biplab <[email protected]<mailto:[email protected]>>
Subject: Re: [OMPI users] Regarding eager limit relationship to send message
size
[CAUTION: External Email]
Biplab,
The eager is a constant for each BTL, and it represent the data that is sent
eagerly with the matching information out of the entire message. So, if the
question is how much memory is needed to store all the eager messages then the
answer will depend on the communication pattern of your application:
- applications using only blocking messages might only have 1 pending
communications per peer, so in the worst case any process will only need at
most P * eager_size memory for local storage of the eager.
- applications using non-blocking communications, there is basically no limit.
However, the good news is that you can change this limit to adapt to the needs
of your application(s).
Hope this answers your question,
George.
On Tue, Mar 24, 2020 at 1:46 AM Raut, S Biplab via users
<[email protected]<mailto:[email protected]>> wrote:
Dear Experts,
I would like to derive/calculate the maximum MPI send
message size possible given the known details of btl_vader_eager_limit and
number of ranks.
Can anybody explain and confirm on this?
With Regards,
S. Biplab Raut