Hi all,
I'm new here, so please be gentle :-)
Versions: OpenMPI 4.0.3rc1, UCX 1.7
I have a hang in an application (OK for small data sets, but fails with a
larger one). The error is
"bsend: failed to allocate buffer"
This comes from
pml_ucx.c:693
mca_pml_ucx_bsend( ... )
...
packed_data = mca_pml_base_bsend_request_alloc_buf(packed_length);
if (OPAL_UNLIKELY(NULL == packed_data)) {
OBJ_DESTRUCT(&opal_conv);
PML_UCX_ERROR( "bsend: failed to allocate buffer");
return UCS_STATUS_PTR(OMPI_ERROR);
}
In fact the request appears to be 1.3MB and the bsend buffer is (should
be!) 128MB
In pml_base_bsend:332
void* mca_pml_base_bsend_request_alloc_buf( size_t length )
{
void* buf = NULL;
/* has a buffer been provided */
OPAL_THREAD_LOCK(&mca_pml_bsend_mutex);
if(NULL == mca_pml_bsend_addr) {
OPAL_THREAD_UNLOCK(&mca_pml_bsend_mutex);
return NULL;
}
/* allocate a buffer to hold packed message */
buf = mca_pml_bsend_allocator->alc_alloc(
mca_pml_bsend_allocator, length, 0);
if(NULL == buf) {
/* release resources when request is freed */
OPAL_THREAD_UNLOCK(&mca_pml_bsend_mutex);
/* progress communications, with the hope that more resources
* will be freed */
opal_progress();
return NULL;
}
/* increment count of pending requests */
mca_pml_bsend_count++;
OPAL_THREAD_UNLOCK(&mca_pml_bsend_mutex);
return buf;
}
It seems that there is a strong hint here that we can wait for the bsend
buffer to become available, and yet mca_pml_ucx_bsend doesn't have a retry
mechanism and just fails on the first attempt. A simple hack to turn
the "if(NULL
== buf) {" into a "while(NULL == buf) {"
in mca_pml_base_bsend_request_alloc_buf seems to support this (the
application proceeds after a few milliseconds)...
Is this hypothesis correct?
Best regards, Martyn