I have an MPI configuration (x86) with multiple processes communicating
within a serverbetween 2 servers (one comm group). There are more processes
than cores available on each server, so to avoid busy-poll, processes block
during intra-node communication on message queues, sender performs
mpi_bsend and sends message queue wakeup. Receiver blocks on message queue,
receives tag in wakeup, performs blocking receive with ANY_SOURCE and tag
(msg Id). For inter-node communication, am using UDP socket wakeup for same
purpose (block and not busy wait), however, I am concerned about rare
instance if UDP wakeup is dropped (network env is fairly 'closed' but could
happen; could use TCP but looking to avoid overhead if possible). In the
case where frequent periodic messages of the same type are sent, there
wouldn't be an issue (even if same message tag sent from multiple processes
since ANY_SOURCE is used in mpi_recv) except the final message might not be
never be processed (if a wakeup is dropped somewhere in the middle of
execution). On the other hand, in the case where infrequent aperiodic
messages are sent, if the associated UDP wakeup is dropped, a receive might
never be posted by the receiver for a bsend call made. What are the
ramifications - might this eventually lead to deadlock since the queue tail
? I realize this architecture is somewhat problematic, however, I have some
design constraints and am wondering if there are any suggestions for
dealing with this potential issue.

Thank you,
Jay

Reply via email to