Jeff,

The first location is indeed in ompi_coll_libnbc_iallreduce()


Lee Ann,


thanks for the bug report,

for the time being, can you please give the attached patch a try ?


Cheers,


Gilles



FWIW

NBC_Schedule_request() sets handle->tmpbuf = tmpbuf and call NBC_Start(handle, schedule)

then handle->schedule = schedule.

if NBC_Start_round() fails, then NBC_Return_handle(handle) calls NBC_Free(handle) that

OBJ_RELEASE(handle->schedule) and free(handle->tmpbuf)

Back to ompi_coll_libnbc_iallreduce(), we once again OBJ_RELEASE(schedule) and free(tmpbuf).




On 3/15/2019 7:27 AM, Jeff Squyres (jsquyres) via users wrote:
Lee Ann --

Thanks for your bug report.

I'm not able to find a call to NBC_Schedule_request() in 
ompi_coll_libnbc_iallreduce().

I see 2 calls to NBC_Schedule_request() in 
ompi/mca/coll/libnbc/nbc_iallreduce.c, but they are in different functions.

Can you clarify exactly which one(s) you're referring to?

Location 1:
https://github.com/open-mpi/ompi/blob/7412e88689ffd1dcc95a587e6ded9d7455fa031c/ompi/mca/coll/libnbc/nbc_iallreduce.c#L183

Location 2:
https://github.com/open-mpi/ompi/blob/7412e88689ffd1dcc95a587e6ded9d7455fa031c/ompi/mca/coll/libnbc/nbc_iallreduce.c#L247



On Mar 14, 2019, at 4:27 PM, Riesen, Lee Ann <lee.ann.rie...@intel.com> wrote:

I'm trying to build OpenMPI 3.1.2 as part of Mellanox HPC-X and I'm having some 
problems with the underlying libraries.  The true problem was masked for awhile 
by an bug in error handling in OpenMPI.  In mca/coll/libnbc/nbc_iallreduce.c in 
function ompi_coll_libnbc_iallreduce() we have some error handling at the end 
that looks like:
res = NBC_Schedule_request (schedule, comm, libnbc_module, request, tmpbuf);
   if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) {
     OBJ_RELEASE(schedule);
     free(tmpbuf);
     return res;
   }
The Schedule_request call failed, and in that call the "schedule" and "tmpbuf" were freed. Then we return and again, the "schedule" and "tmpbuf" are freed. It looks like this occurs elsewhere in the source file too. Lee Ann -----
Lee Ann Riesen, Enterprise and Government Group, Intel Corporation, Hillsboro, 
OR
Phone 503-613-1952
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

diff --git a/ompi/mca/coll/libnbc/nbc.c b/ompi/mca/coll/libnbc/nbc.c
index dff6362..43eec5f 100644
--- a/ompi/mca/coll/libnbc/nbc.c
+++ b/ompi/mca/coll/libnbc/nbc.c
@@ -720,6 +720,9 @@ int NBC_Schedule_request(NBC_Schedule *schedule, 
ompi_communicator_t *comm, ompi
 
   res = NBC_Start (handle, schedule);
   if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) {
+    /* temporary workaround to avoid double free */
+    handle->tmpbuf = NULL;
+    handle->schedule = NULL;
     NBC_Return_handle (handle);
     return res;
   }
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to