Thanks for the quick feedback. I opened an issue here: https://github.com/open-mpi/ompi/issues/5383
Clyde Stanfield Software Engineer 734-480-5100 office clyde.stanfi...@mdaus.com -----Original Message----- From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Nathan Hjelm via users Sent: Friday, July 06, 2018 10:57 AM To: Open MPI Users <users@lists.open-mpi.org> Cc: Nathan Hjelm <hje...@me.com> Subject: Re: [OMPI users] MPI_Ialltoallv No, thats a bug. Please open an issue on github and we will fix it shortly. Thanks for reporting this issue. -Nathan > On Jul 6, 2018, at 8:08 AM, Stanfield, Clyde > <clyde.stanfi...@radiantsolutions.com> wrote: > > We are using MPI_Ialltoallv for an image processing algorithm. When doing > this we pass in an MPI_Type_contiguous with an MPI_Datatype of > MPI_C_FLOAT_COMPLEX which ends up being the size of multiple rows of the > image (based on the number of nodes used for distribution). In addition > sendcounts, sdispls, resvcounts, and rdispls all fit within a signed int. > Usually this works without any issues, but when we lower our number of nodes > we sometimes see failures. > > What we found is that even though we can fit everything into signed ints, > line 528 of nbc_internal.h ends up calling a malloc with an int that appears > to be the size of the (num_distributed_rows * num_columns * > sizeof(std::complex<float>)) which in very large cases wraps back to > negative. As a result we end up seeing “Error in malloc()” (line 530 of > nbc_internal.h) throughout our output. > > We can get around this issue by ensuring the sum of our contiguous type never > exceeds 2GB. However, this was unexpected to us as our understanding was that > all long as we can fit all the parts into signed ints we should be able to > transfer more than 2GB at a time. Is it intended that MPI_Ialltoallv requires > your underlying data to be less than 2GB or is this in error in how malloc is > being called (should be called with a size_t instead of an int)? > > Thanks, > Clyde Stanfield > > <image001.jpg> > Clyde Stanfield > Software Engineer > 734-480-5100 office > clyde.stanfi...@mdaus.com > <image002.png> <image003.png> > > > > > The information contained in this communication is confidential, is intended > only for the use of the recipient(s) named above, and may be legally > privileged. If the reader of this message is not the intended recipient, you > are hereby notified that any dissemination, distribution, or copying of this > communication is strictly prohibited. If you have received this communication > in error, please re-send this communication to the sender and delete the > original message or any copy of it from your computer system. > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users