Issue #8290 reported. Thanks all for your help and the workaround provided.
Patrick Le 14/12/2020 à 17:40, Jeff Squyres (jsquyres) a écrit : > Yes, opening an issue would be great -- thanks! > > >> On Dec 14, 2020, at 11:32 AM, Patrick Bégou via users >> <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>> wrote: >> >> OK, Thanks Gilles. >> Does it still require that I open an issue for tracking ? >> >> Patrick >> >> Le 14/12/2020 à 14:56, Gilles Gouaillardet via users a écrit : >>> Hi Patrick, >>> >>> Glad to hear you are now able to move forward. >>> >>> Please keep in mind this is not a fix but a temporary workaround. >>> At first glance, I did not spot any issue in the current code. >>> It turned out that the memory leak disappeared when doing things >>> differently >>> >>> Cheers, >>> >>> Gilles >>> >>> On Mon, Dec 14, 2020 at 7:11 PM Patrick Bégou via users >>> <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>> wrote: >>> >>> Hi Gilles, >>> >>> you catch the bug! With this patch, on a single node, the memory >>> leak disappear. The cluster is actualy overloaded, as soon as >>> possible I will launch a multinode test. >>> Below the memory used by rank 0 before (blue) and after (red) >>> the patch. >>> >>> Thanks >>> >>> Patrick >>> >>> <patch.png> >>> >>> Le 10/12/2020 à 10:15, Gilles Gouaillardet via users a écrit : >>>> Patrick, >>>> >>>> >>>> First, thank you very much for sharing the reproducer. >>>> >>>> >>>> Yes, please open a github issue so we can track this. >>>> >>>> >>>> I cannot fully understand where the leak is coming from, but so >>>> far >>>> >>>> - the code fails on master built with --enable-debug (the data >>>> engine reports an error) but not with the v3.1.x branch >>>> >>>> (this suggests there could be an error in the latest Open MPI >>>> ... or in the code) >>>> >>>> - the attached patch seems to have a positive effect, can you >>>> please give it a try? >>>> >>>> >>>> Cheers, >>>> >>>> >>>> Gilles >>>> >>>> >>>> >>>> On 12/7/2020 6:15 PM, Patrick Bégou via users wrote: >>>>> Hi, >>>>> >>>>> I've written a small piece of code to show the problem. Based >>>>> on my application but 2D and using integers arrays for testing. >>>>> The figure below shows the max RSS size of rank 0 process on >>>>> 20000 iterations on 8 and 16 cores, with openib and tcp drivers. >>>>> The more processes I have, the larger the memory leak. I use >>>>> the same binaries for the 4 runs and OpenMPI 3.1 (same >>>>> behavior with 4.0.5). >>>>> The code is in attachment. I'll try to check type deallocation >>>>> as soon as possible. >>>>> >>>>> Patrick >>>>> >>>>> >>>>> >>>>> >>>>> Le 04/12/2020 à 01:34, Gilles Gouaillardet via users a écrit : >>>>>> Patrick, >>>>>> >>>>>> >>>>>> based on George's idea, a simpler check is to retrieve the >>>>>> Fortran index via the (standard) MPI_Type_c2() function >>>>>> >>>>>> after you create a derived datatype. >>>>>> >>>>>> >>>>>> If the index keeps growing forever even after you >>>>>> MPI_Type_free(), then this clearly indicates a leak. >>>>>> >>>>>> Unfortunately, this simple test cannot be used to definitely >>>>>> rule out any memory leak. >>>>>> >>>>>> >>>>>> Note you can also >>>>>> >>>>>> mpirun --mca pml ob1 --mca btl tcp,self ... >>>>>> >>>>>> in order to force communications over TCP/IP and hence rule >>>>>> out any memory leak that could be triggered by your fast >>>>>> interconnect. >>>>>> >>>>>> >>>>>> >>>>>> In any case, a reproducer will greatly help us debugging this >>>>>> issue. >>>>>> >>>>>> >>>>>> Cheers, >>>>>> >>>>>> >>>>>> Gilles >>>>>> >>>>>> >>>>>> >>>>>> On 12/4/2020 7:20 AM, George Bosilca via users wrote: >>>>>>> Patrick, >>>>>>> >>>>>>> I'm afraid there is no simple way to check this. The main >>>>>>> reason being that OMPI use handles for MPI objects, and >>>>>>> these handles are not tracked by the library, they are >>>>>>> supposed to be provided by the user for each call. In >>>>>>> your case, as you already called MPI_Type_free on the >>>>>>> datatype, you cannot produce a valid handle. >>>>>>> >>>>>>> There might be a trick. If the datatype is manipulated with >>>>>>> any Fortran MPI functions, then we convert the handle (which >>>>>>> in fact is a pointer) to an index into a pointer array >>>>>>> structure. Thus, the index will remain used, and can >>>>>>> therefore be used to convert back into a valid datatype >>>>>>> pointer, until OMPI completely releases the datatype. Look >>>>>>> into the ompi_datatype_f_to_c_table table to see the >>>>>>> datatypes that exist and get their pointers, and then use >>>>>>> these pointers as arguments to ompi_datatype_dump() to see >>>>>>> if any of these existing datatypes are the ones you define. >>>>>>> >>>>>>> George. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Dec 3, 2020 at 4:44 PM Patrick Bégou via users >>>>>>> <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >>>>>>> <mailto:users@lists.open-mpi.org> >>>>>>> <mailto:users@lists.open-mpi.org>> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I'm trying to solve a memory leak since my new >>>>>>> implementation of >>>>>>> communications based on MPI_AllToAllW and >>>>>>> MPI_type_Create_SubArray >>>>>>> calls. Arrays of SubArray types are created/destroyed >>>>>>> at each >>>>>>> time step and used for communications. >>>>>>> >>>>>>> On my laptop the code runs fine (running for 15000 temporal >>>>>>> itérations on 32 processes with oversubscription) but on >>>>>>> our >>>>>>> cluster memory used by the code increase until the >>>>>>> OOMkiller stop >>>>>>> the job. On the cluster we use IB QDR for communications. >>>>>>> >>>>>>> Same Gcc/Gfortran 7.3 (built from sources), same sources of >>>>>>> OpenMPI (3.1 or 4.0.5 tested), same sources of the >>>>>>> fortran code on >>>>>>> the laptop and on the cluster. >>>>>>> >>>>>>> Using Gcc/Gfortran 4.8 and OpenMPI 1.7.3 on the cluster >>>>>>> do not >>>>>>> show the problem (resident memory do not increase and we >>>>>>> ran >>>>>>> 100000 temporal iterations) >>>>>>> >>>>>>> MPI_type_free manual says that it "/Marks the datatype >>>>>>> object >>>>>>> associated with datatype for deallocation/". But how >>>>>>> can I check >>>>>>> that the deallocation is really done ? >>>>>>> >>>>>>> Thanks for ant suggestions. >>>>>>> >>>>>>> Patrick >>>>>>> >>>>> >>> >> > > > -- > Jeff Squyres > jsquy...@cisco.com <mailto:jsquy...@cisco.com> >