With the TCP btl, when free list items are exhausted, OMPI 1.2.6 falls into an infinite loop:
#3981 0x0000002a98b4e23f in opal_condition_wait (c=0x2a98c541d0, m=0x2a98c54180) at ../../../../opal/threads/condition.h:81 #3982 0x0000002a98b4e0e9 in __ompi_free_list_wait (fl=0x2a98c540d0, item=0x7fa82af630) at ../../../../ompi/class/ompi_free_list.h:187 #3983 0x0000002a98b4dbd4 in mca_btl_tcp_endpoint_recv_handler (sd=18, flags=2, user=0xc20240) at btl_tcp_endpoint.c:611 #3984 0x0000002a95bf78de in opal_event_process_active (base=0xb81390) at event.c:464 #3985 0x0000002a95bf7c0a in opal_event_base_loop (base=0xb81390, flags=2) at event.c:603 #3986 0x0000002a95bf79c7 in opal_event_loop (flags=2) at event.c:517 #3987 0x0000002a95bf2227 in opal_progress () at runtime/opal_progress.c:259 #3988 0x0000002a98b4e23f in opal_condition_wait (c=0x2a98c541d0, m=0x2a98c54180) at ../../../../opal/threads/condition.h:81 #3989 0x0000002a98b4e0e9 in __ompi_free_list_wait (fl=0x2a98c540d0, item=0x7fa82af7f0) at ../../../../ompi/class/ompi_free_list.h:187 #3990 0x0000002a98b4dbd4 in mca_btl_tcp_endpoint_recv_handler (sd=22, flags=2, user=0xc2dcf0) at btl_tcp_endpoint.c:611 #3991 0x0000002a95bf78de in opal_event_process_active (base=0xb81390) at event.c:464 #3992 0x0000002a95bf7c0a in opal_event_base_loop (base=0xb81390, flags=2) at event.c:603 #3993 0x0000002a95bf79c7 in opal_event_loop (flags=2) at event.c:517 #3994 0x0000002a95bf2227 in opal_progress () at runtime/opal_progress.c:259 #3995 0x0000002a98b4e23f in opal_condition_wait (c=0x2a98c541d0, m=0x2a98c54180) at ../../../../opal/threads/condition.h:81 The call used to get a free list item is OMPI_FREE_LIST_WAIT(), which is supposed to block until an item is available. However, it calls opal_condition_wait(), which in turn calls opal_process(), which then waits for a free list item..... It seems strange to me that opal_condition_wait() calls opal_progress(), but I'm not that familiar with the code. Is it possible that this has been fixed in 1.3? I haven't tried 1.3 yet because I will have to file a truckload of bugs against 1.3 first. Should I be posting this stuff to the devel list? Thanks, mch