Hi George, Thanks for the reply. I agree that the the behaviour isn't outside of the MPI standard (perhaps I shouldn't have used "fault" in the title!). I guess from a utility perspective, my comment is that its undesirable that the application performs a hard stop when it appears that it can safely proceed with a modest code change to stall to allow other ops to complete. Is it worth proposing a patch to that effect?
Of course the application could be coded better in this area, but these things are not always trivial - in this case it appears I need to allocate >1GB or more to reliably execute without a mod to the OMPI source. Martyn On Tue, 17 Mar 2020 at 15:20, George Bosilca <bosi...@icl.utk.edu> wrote: > Martyn, > > I don't know exactly what your code is doing, but based on your inquiry I > assume you are using MPI_BSEND multiple times and you run out of local > buffers. > > The MPI standard does not mandate a wait until buffer space becomes > available, because that can lead to deadlocks (communication pattern > depends on a local receive that will be posted after the bsend loop). > Instead, the MPI standard states it is the user's responsibility to ensure > enough buffer is available before calling MPI_BSEND, MPI3.2 page 39 line > 36, "then MPI must buffer the outgoing message, so as to allow the send to > complete. An error will occur if there is insufficient buffer space". For > blocking buffered sends this is a gray area because from a user perspective > it is difficult to know when you can safely reuse the buffer without > implementing some kind of feedback mechanism to confirm the reception. For > nonblocking the constraint is relaxed as indicated on page 55 line 33, > "Successful return of MPI_WAIT after a MPI_IBSEND implies that the user > buffer can be reused". > > In short, you should always make sure you have enough available buffer > space for your buffered sends to be able to locally pack the data to be > sent, or be ready to deal with the error returned by MPI (this part would > not be portable across different MPI implementations). > > George. > > > > > On Tue, Mar 17, 2020 at 7:59 AM Martyn Foster via users < > users@lists.open-mpi.org> wrote: > >> Hi all, >> >> I'm new here, so please be gentle :-) >> >> Versions: OpenMPI 4.0.3rc1, UCX 1.7 >> >> I have a hang in an application (OK for small data sets, but fails with a >> larger one). The error is >> >> "bsend: failed to allocate buffer" >> >> This comes from >> >> pml_ucx.c:693 >> mca_pml_ucx_bsend( ... ) >> ... >> packed_data = mca_pml_base_bsend_request_alloc_buf(packed_length); >> if (OPAL_UNLIKELY(NULL == packed_data)) { >> OBJ_DESTRUCT(&opal_conv); >> PML_UCX_ERROR( "bsend: failed to allocate buffer"); >> return UCS_STATUS_PTR(OMPI_ERROR); >> } >> >> In fact the request appears to be 1.3MB and the bsend buffer is (should >> be!) 128MB >> >> In pml_base_bsend:332 >> void* mca_pml_base_bsend_request_alloc_buf( size_t length ) >> { >> void* buf = NULL; >> /* has a buffer been provided */ >> OPAL_THREAD_LOCK(&mca_pml_bsend_mutex); >> if(NULL == mca_pml_bsend_addr) { >> OPAL_THREAD_UNLOCK(&mca_pml_bsend_mutex); >> return NULL; >> } >> >> /* allocate a buffer to hold packed message */ >> buf = mca_pml_bsend_allocator->alc_alloc( >> mca_pml_bsend_allocator, length, 0); >> if(NULL == buf) { >> /* release resources when request is freed */ >> OPAL_THREAD_UNLOCK(&mca_pml_bsend_mutex); >> /* progress communications, with the hope that more resources >> * will be freed */ >> opal_progress(); >> return NULL; >> } >> >> /* increment count of pending requests */ >> mca_pml_bsend_count++; >> OPAL_THREAD_UNLOCK(&mca_pml_bsend_mutex); >> >> return buf; >> } >> >> It seems that there is a strong hint here that we can wait for the bsend >> buffer to become available, and yet mca_pml_ucx_bsend doesn't have a retry >> mechanism and just fails on the first attempt. A simple hack to turn the >> "if(NULL >> == buf) {" into a "while(NULL == buf) {" >> in mca_pml_base_bsend_request_alloc_buf seems to support this (the >> application proceeds after a few milliseconds)... >> >> Is this hypothesis correct? >> >> Best regards, Martyn >> >> >>