Hi all,

I'm new here, so please be gentle :-)

Versions: OpenMPI 4.0.3rc1, UCX 1.7

I have a hang in an application (OK for small data sets, but fails with a
larger one). The error is

"bsend: failed to allocate buffer"

This comes from

pml_ucx.c:693
mca_pml_ucx_bsend( ... )
...
packed_data = mca_pml_base_bsend_request_alloc_buf(packed_length);
    if (OPAL_UNLIKELY(NULL == packed_data)) {
        OBJ_DESTRUCT(&opal_conv);
        PML_UCX_ERROR( "bsend: failed to allocate buffer");
        return UCS_STATUS_PTR(OMPI_ERROR);
    }

In fact the request appears to be 1.3MB and the bsend buffer is (should
be!) 128MB

In pml_base_bsend:332
void*  mca_pml_base_bsend_request_alloc_buf( size_t length )
{
   void* buf = NULL;
    /* has a buffer been provided */
    OPAL_THREAD_LOCK(&mca_pml_bsend_mutex);
    if(NULL == mca_pml_bsend_addr) {
        OPAL_THREAD_UNLOCK(&mca_pml_bsend_mutex);
        return NULL;
    }

    /* allocate a buffer to hold packed message */
    buf = mca_pml_bsend_allocator->alc_alloc(
        mca_pml_bsend_allocator, length, 0);
    if(NULL == buf) {
        /* release resources when request is freed */
        OPAL_THREAD_UNLOCK(&mca_pml_bsend_mutex);
        /* progress communications, with the hope that more resources
         *   will be freed */
        opal_progress();
        return NULL;
    }

    /* increment count of pending requests */
    mca_pml_bsend_count++;
    OPAL_THREAD_UNLOCK(&mca_pml_bsend_mutex);

    return buf;
}

It seems that there is a strong hint here that we can wait for the bsend
buffer to become available, and yet mca_pml_ucx_bsend doesn't have a retry
mechanism and just fails on the first attempt. A simple hack to turn
the "if(NULL
== buf) {" into a "while(NULL == buf) {"
in mca_pml_base_bsend_request_alloc_buf seems to support this (the
application proceeds after a few milliseconds)...

Is this hypothesis correct?

Best regards, Martyn

Reply via email to