I have an MPI configuration (x86) with multiple processes communicating within a serverbetween 2 servers (one comm group). There are more processes than cores available on each server, so to avoid busy-poll, processes block during intra-node communication on message queues, sender performs mpi_bsend and sends message queue wakeup. Receiver blocks on message queue, receives tag in wakeup, performs blocking receive with ANY_SOURCE and tag (msg Id). For inter-node communication, am using UDP socket wakeup for same purpose (block and not busy wait), however, I am concerned about rare instance if UDP wakeup is dropped (network env is fairly 'closed' but could happen; could use TCP but looking to avoid overhead if possible). In the case where frequent periodic messages of the same type are sent, there wouldn't be an issue (even if same message tag sent from multiple processes since ANY_SOURCE is used in mpi_recv) except the final message might not be never be processed (if a wakeup is dropped somewhere in the middle of execution). On the other hand, in the case where infrequent aperiodic messages are sent, if the associated UDP wakeup is dropped, a receive might never be posted by the receiver for a bsend call made. What are the ramifications - might this eventually lead to deadlock since the queue tail ? I realize this architecture is somewhat problematic, however, I have some design constraints and am wondering if there are any suggestions for dealing with this potential issue.
Thank you, Jay