Some folks from ORNL have done some studies about OMPI memory usage a few years ago, but I am not sure if these studies are openly available. OMPI manages all the MCA parameters, user facing requests, unexpected messages, temporary buffers for collectives and IO. And those are, I might be slightly extrapolating here, linearly dependent on the number of communicators and non-blocking and persistent requests existing.
As a general statement low-level communication libraries are not supposed to use much memory, and the amount would be capped to some extent (logically by number of endpoints or connections). In particular, UCX has a similar memory tracking mechanism to OMPI, via ucs_malloc and friends. Take a look at ucs/debug/memtrack.c to figure out how to enable it (maybe enabling statistics, aka. ENABLE_STATS, is enough). George. On Mon, Apr 17, 2023 at 1:16 PM Brian Dobbins <bdobb...@gmail.com> wrote: > > Hi George, > > Got it, thanks for the info - I naively hadn't even considered that of > course all the related libraries likely have their *own* allocators. So, > for *OpenMPI, *it sounds like I can use my own opal_[mc]alloc calls, with > a new build turning mem debugging on, to tally up and report the total size > of OpenMPI allocations, and that seems pretty straightforward. But I'd > guess that for a data-heavy MPI application, the majority of the memory > will be in transport-level buffers, and that's (for me) likely the UCX > layer, so I should look to that community / code for quantifying how large > those buffers get inside my application? > > Thanks again, and apologies for what is surely a woeful misuse of the > correct terminology here on some of this stuff. > > - Brian > > > On Mon, Apr 17, 2023 at 11:05 AM George Bosilca <bosi...@icl.utk.edu> > wrote: > >> Brian, >> >> OMPI does not have an official mechanism to report how much memory OMPI >> allocates. But, there is hope: >> >> 1. We have a mechanism to help debug memory issues >> (OPAL_ENABLE_MEM_DEBUG). You could enable it and then provide your own >> flavor of memory tracking in opal/util/malloc.c >> 2. You can use a traditional malloc trapping mechanism (valgrind, malt, >> mtrace,...), and investigate the stack to detect where the allocation was >> issued and then count. >> >> The first approach would only give you the memory used by OMPI itself, >> not the other libraries we are using (PMIx, HWLOC, UCX, ...). The second >> might be a little more generic, but depend on external tools and might take >> a little time to setup. >> >> George. >> >> >> On Fri, Apr 14, 2023 at 3:31 PM Brian Dobbins via users < >> users@lists.open-mpi.org> wrote: >> >>> >>> Hi all, >>> >>> I'm wondering if there's a simple way to get statistics from OpenMPI >>> as to how much memory the *MPI* layer in an application is taking. For >>> example, I'm running a model and I can get the RSS size at various points >>> in the code, and that reflects the user data for the application, *plus*, >>> surely, buffers for MPI messages that are either allocated at runtime or, >>> maybe, a pool from start-up. The memory use -which I assume is tied to >>> internal buffers? differs considerably with *how* I run MPI - eg, TCP >>> vs UCX, and with UCX, a UD vs RC mode. >>> >>> Here's an example of this: >>> >>> 60km (163842 columns), 2304 ranks [OpenMPI] >>> UCX Transport Changes (environment variable) >>> (No recompilation; all runs done on same nodes) >>> Showing memory after ATM-TO-MED Step >>> [RSS Memory in MB] >>> >>> Standard Decomposition >>> UCX_TLS value ud default rc >>> Run 1 347.03 392.08 750.32 >>> Run 2 346.96 391.86 748.39 >>> Run 3 346.89 392.18 750.23 >>> >>> I'd love a way to trace how much *MPI alone* is using, since here I'm >>> still measuring the *process's* RSS. My feeling is that if, for >>> example, I'm running on N nodes and have a 1GB dataset + (for the sake of >>> discussion) 100MB of MPI info, then at 2N, with good scaling of domain >>> memory, that's 500MB + 100MB, at 4N it's 250MB/100MB, and eventually, at >>> 16N, the MPI memory dominates. As a result, when we scale out, even with >>> perfect scaling of *domain* memory, at some point memory associated >>> with MPI will cause this curve to taper off, and potentially invert. But >>> I'm admittedly *way* out of date on how modern MPI implementations >>> allocate buffers. >>> >>> In short, any tips on ways to better characterize MPI memory use would >>> be *greatly* appreciated! If this is purely on the UCX (or other >>> transport) level, that's good to know too. >>> >>> Thanks, >>> - Brian >>> >>> >>>