I tried 1.8.5rc1 now. It behaves very similar to 1.8.4 from my point of view (and completely different from 1.6.5). The warning [warn] opal_libevent2021_event_base_loop: reentrant invocation. Only one event_base_loop can run on each event_base at once. is still there.
It's easy for me to (re)produce a deadlock with both 1.8.4 and 1.8.5rc1. With 1.8.5rc1, I sometimes even get the deadlock without the warning. The following seems crucial for reproducing the deadlock 1) start a worker on the same node as the master 2) chop big messages into 1k blocks. With 2k blocks, the deadlocks become rarer, and with 4k blocks (or no choping at all), the deadlocks seem to be gone. the deadlock happens even with a single worker #0 0x000000363f20e054 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x000000363f209388 in _L_lock_854 () from /lib64/libpthread.so.0 #2 0x000000363f209257 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007f9901d47343 in mca_btl_vader_component_progress () from /homes/data/public/Development/3rdParty/install/openmpi-1.8.5rc1/Linux-x86_64-redhat.6.3/M64/lib/openmpi/mca_btl_vader.so #4 0x00007f9910a9b49a in opal_progress () from /homes/data/public/Development/3rdParty/install/openmpi-1.8.5rc1/Linux-x86_64-redhat.6.3/M64/lib/libopen-pal.so.6 #5 0x00007f990170594d in mca_pml_ob1_send () from /homes/data/public/Development/3rdParty/install/openmpi-1.8.5rc1/Linux-x86_64-redhat.6.3/M64/lib/openmpi/mca_pml_ob1.so