Martyn,

I don't know exactly what your code is doing, but based on your inquiry I
assume you are using MPI_BSEND multiple times and you run out of local
buffers.

The MPI standard does not mandate a wait until buffer space becomes
available, because that can lead to deadlocks (communication pattern
depends on a local receive that will be posted after the bsend loop).
Instead, the MPI standard states it is the user's responsibility to ensure
enough buffer is available before calling MPI_BSEND, MPI3.2 page 39 line
36, "then MPI must buffer the outgoing message, so as to allow the send to
complete. An error will occur if there is insufficient buffer space". For
blocking buffered sends this is a gray area because from a user perspective
it is difficult to know when you can safely reuse the buffer without
implementing some kind of feedback mechanism to confirm the reception. For
nonblocking the constraint is relaxed as indicated on page 55 line 33,
"Successful return of MPI_WAIT after a MPI_IBSEND implies that the user
buffer can be reused".

In short, you should always make sure you have enough available buffer
space for your buffered sends to be able to locally pack the data to be
sent, or be ready to deal with the error returned by MPI (this part would
not be portable across different MPI implementations).

  George.




On Tue, Mar 17, 2020 at 7:59 AM Martyn Foster via users <
users@lists.open-mpi.org> wrote:

> Hi all,
>
> I'm new here, so please be gentle :-)
>
> Versions: OpenMPI 4.0.3rc1, UCX 1.7
>
> I have a hang in an application (OK for small data sets, but fails with a
> larger one). The error is
>
> "bsend: failed to allocate buffer"
>
> This comes from
>
> pml_ucx.c:693
> mca_pml_ucx_bsend( ... )
> ...
> packed_data = mca_pml_base_bsend_request_alloc_buf(packed_length);
>     if (OPAL_UNLIKELY(NULL == packed_data)) {
>         OBJ_DESTRUCT(&opal_conv);
>         PML_UCX_ERROR( "bsend: failed to allocate buffer");
>         return UCS_STATUS_PTR(OMPI_ERROR);
>     }
>
> In fact the request appears to be 1.3MB and the bsend buffer is (should
> be!) 128MB
>
> In pml_base_bsend:332
> void*  mca_pml_base_bsend_request_alloc_buf( size_t length )
> {
>    void* buf = NULL;
>     /* has a buffer been provided */
>     OPAL_THREAD_LOCK(&mca_pml_bsend_mutex);
>     if(NULL == mca_pml_bsend_addr) {
>         OPAL_THREAD_UNLOCK(&mca_pml_bsend_mutex);
>         return NULL;
>     }
>
>     /* allocate a buffer to hold packed message */
>     buf = mca_pml_bsend_allocator->alc_alloc(
>         mca_pml_bsend_allocator, length, 0);
>     if(NULL == buf) {
>         /* release resources when request is freed */
>         OPAL_THREAD_UNLOCK(&mca_pml_bsend_mutex);
>         /* progress communications, with the hope that more resources
>          *   will be freed */
>         opal_progress();
>         return NULL;
>     }
>
>     /* increment count of pending requests */
>     mca_pml_bsend_count++;
>     OPAL_THREAD_UNLOCK(&mca_pml_bsend_mutex);
>
>     return buf;
> }
>
> It seems that there is a strong hint here that we can wait for the bsend
> buffer to become available, and yet mca_pml_ucx_bsend doesn't have a retry
> mechanism and just fails on the first attempt. A simple hack to turn the 
> "if(NULL
> == buf) {" into a "while(NULL == buf) {"
> in mca_pml_base_bsend_request_alloc_buf seems to support this (the
> application proceeds after a few milliseconds)...
>
> Is this hypothesis correct?
>
> Best regards, Martyn
>
>
>

Reply via email to