Re: [OMPI users] Q: Getting MPI-level memory use from OpenMPI?
Brian, OMPI does not have an official mechanism to report how much memory OMPI allocates. But, there is hope: 1. We have a mechanism to help debug memory issues (OPAL_ENABLE_MEM_DEBUG). You could enable it and then provide your own flavor of memory tracking in opal/util/malloc.c 2. You can use a traditional malloc trapping mechanism (valgrind, malt, mtrace,...), and investigate the stack to detect where the allocation was issued and then count. The first approach would only give you the memory used by OMPI itself, not the other libraries we are using (PMIx, HWLOC, UCX, ...). The second might be a little more generic, but depend on external tools and might take a little time to setup. George. On Fri, Apr 14, 2023 at 3:31 PM Brian Dobbins via users < users@lists.open-mpi.org> wrote: > > Hi all, > > I'm wondering if there's a simple way to get statistics from OpenMPI as > to how much memory the *MPI* layer in an application is taking. For > example, I'm running a model and I can get the RSS size at various points > in the code, and that reflects the user data for the application, *plus*, > surely, buffers for MPI messages that are either allocated at runtime or, > maybe, a pool from start-up. The memory use -which I assume is tied to > internal buffers? differs considerably with *how* I run MPI - eg, TCP vs > UCX, and with UCX, a UD vs RC mode. > > Here's an example of this: > > 60km (163842 columns), 2304 ranks [OpenMPI] > UCX Transport Changes (environment variable) > (No recompilation; all runs done on same nodes) > Showing memory after ATM-TO-MED Step > [RSS Memory in MB] > > Standard Decomposition > UCX_TLS value ud default rc > Run 1 347.03 392.08 750.32 > Run 2 346.96 391.86 748.39 > Run 3 346.89 392.18 750.23 > > I'd love a way to trace how much *MPI alone* is using, since here I'm > still measuring the *process's* RSS. My feeling is that if, for example, > I'm running on N nodes and have a 1GB dataset + (for the sake of > discussion) 100MB of MPI info, then at 2N, with good scaling of domain > memory, that's 500MB + 100MB, at 4N it's 250MB/100MB, and eventually, at > 16N, the MPI memory dominates. As a result, when we scale out, even with > perfect scaling of *domain* memory, at some point memory associated with > MPI will cause this curve to taper off, and potentially invert. But I'm > admittedly *way* out of date on how modern MPI implementations allocate > buffers. > > In short, any tips on ways to better characterize MPI memory use would > be *greatly* appreciated! If this is purely on the UCX (or other > transport) level, that's good to know too. > > Thanks, > - Brian > > >
Re: [OMPI users] Q: Getting MPI-level memory use from OpenMPI?
Hi George, Got it, thanks for the info - I naively hadn't even considered that of course all the related libraries likely have their *own* allocators. So, for *OpenMPI, *it sounds like I can use my own opal_[mc]alloc calls, with a new build turning mem debugging on, to tally up and report the total size of OpenMPI allocations, and that seems pretty straightforward. But I'd guess that for a data-heavy MPI application, the majority of the memory will be in transport-level buffers, and that's (for me) likely the UCX layer, so I should look to that community / code for quantifying how large those buffers get inside my application? Thanks again, and apologies for what is surely a woeful misuse of the correct terminology here on some of this stuff. - Brian On Mon, Apr 17, 2023 at 11:05 AM George Bosilca wrote: > Brian, > > OMPI does not have an official mechanism to report how much memory OMPI > allocates. But, there is hope: > > 1. We have a mechanism to help debug memory issues > (OPAL_ENABLE_MEM_DEBUG). You could enable it and then provide your own > flavor of memory tracking in opal/util/malloc.c > 2. You can use a traditional malloc trapping mechanism (valgrind, malt, > mtrace,...), and investigate the stack to detect where the allocation was > issued and then count. > > The first approach would only give you the memory used by OMPI itself, not > the other libraries we are using (PMIx, HWLOC, UCX, ...). The second might > be a little more generic, but depend on external tools and might take a > little time to setup. > > George. > > > On Fri, Apr 14, 2023 at 3:31 PM Brian Dobbins via users < > users@lists.open-mpi.org> wrote: > >> >> Hi all, >> >> I'm wondering if there's a simple way to get statistics from OpenMPI as >> to how much memory the *MPI* layer in an application is taking. For >> example, I'm running a model and I can get the RSS size at various points >> in the code, and that reflects the user data for the application, *plus*, >> surely, buffers for MPI messages that are either allocated at runtime or, >> maybe, a pool from start-up. The memory use -which I assume is tied to >> internal buffers? differs considerably with *how* I run MPI - eg, TCP vs >> UCX, and with UCX, a UD vs RC mode. >> >> Here's an example of this: >> >> 60km (163842 columns), 2304 ranks [OpenMPI] >> UCX Transport Changes (environment variable) >> (No recompilation; all runs done on same nodes) >> Showing memory after ATM-TO-MED Step >> [RSS Memory in MB] >> >> Standard Decomposition >> UCX_TLS value ud default rc >> Run 1 347.03 392.08 750.32 >> Run 2 346.96 391.86 748.39 >> Run 3 346.89 392.18 750.23 >> >> I'd love a way to trace how much *MPI alone* is using, since here I'm >> still measuring the *process's* RSS. My feeling is that if, for >> example, I'm running on N nodes and have a 1GB dataset + (for the sake of >> discussion) 100MB of MPI info, then at 2N, with good scaling of domain >> memory, that's 500MB + 100MB, at 4N it's 250MB/100MB, and eventually, at >> 16N, the MPI memory dominates. As a result, when we scale out, even with >> perfect scaling of *domain* memory, at some point memory associated with >> MPI will cause this curve to taper off, and potentially invert. But I'm >> admittedly *way* out of date on how modern MPI implementations allocate >> buffers. >> >> In short, any tips on ways to better characterize MPI memory use would >> be *greatly* appreciated! If this is purely on the UCX (or other >> transport) level, that's good to know too. >> >> Thanks, >> - Brian >> >> >>
Re: [OMPI users] Q: Getting MPI-level memory use from OpenMPI?
Some folks from ORNL have done some studies about OMPI memory usage a few years ago, but I am not sure if these studies are openly available. OMPI manages all the MCA parameters, user facing requests, unexpected messages, temporary buffers for collectives and IO. And those are, I might be slightly extrapolating here, linearly dependent on the number of communicators and non-blocking and persistent requests existing. As a general statement low-level communication libraries are not supposed to use much memory, and the amount would be capped to some extent (logically by number of endpoints or connections). In particular, UCX has a similar memory tracking mechanism to OMPI, via ucs_malloc and friends. Take a look at ucs/debug/memtrack.c to figure out how to enable it (maybe enabling statistics, aka. ENABLE_STATS, is enough). George. On Mon, Apr 17, 2023 at 1:16 PM Brian Dobbins wrote: > > Hi George, > > Got it, thanks for the info - I naively hadn't even considered that of > course all the related libraries likely have their *own* allocators. So, > for *OpenMPI, *it sounds like I can use my own opal_[mc]alloc calls, with > a new build turning mem debugging on, to tally up and report the total size > of OpenMPI allocations, and that seems pretty straightforward. But I'd > guess that for a data-heavy MPI application, the majority of the memory > will be in transport-level buffers, and that's (for me) likely the UCX > layer, so I should look to that community / code for quantifying how large > those buffers get inside my application? > > Thanks again, and apologies for what is surely a woeful misuse of the > correct terminology here on some of this stuff. > > - Brian > > > On Mon, Apr 17, 2023 at 11:05 AM George Bosilca > wrote: > >> Brian, >> >> OMPI does not have an official mechanism to report how much memory OMPI >> allocates. But, there is hope: >> >> 1. We have a mechanism to help debug memory issues >> (OPAL_ENABLE_MEM_DEBUG). You could enable it and then provide your own >> flavor of memory tracking in opal/util/malloc.c >> 2. You can use a traditional malloc trapping mechanism (valgrind, malt, >> mtrace,...), and investigate the stack to detect where the allocation was >> issued and then count. >> >> The first approach would only give you the memory used by OMPI itself, >> not the other libraries we are using (PMIx, HWLOC, UCX, ...). The second >> might be a little more generic, but depend on external tools and might take >> a little time to setup. >> >> George. >> >> >> On Fri, Apr 14, 2023 at 3:31 PM Brian Dobbins via users < >> users@lists.open-mpi.org> wrote: >> >>> >>> Hi all, >>> >>> I'm wondering if there's a simple way to get statistics from OpenMPI >>> as to how much memory the *MPI* layer in an application is taking. For >>> example, I'm running a model and I can get the RSS size at various points >>> in the code, and that reflects the user data for the application, *plus*, >>> surely, buffers for MPI messages that are either allocated at runtime or, >>> maybe, a pool from start-up. The memory use -which I assume is tied to >>> internal buffers? differs considerably with *how* I run MPI - eg, TCP >>> vs UCX, and with UCX, a UD vs RC mode. >>> >>> Here's an example of this: >>> >>> 60km (163842 columns), 2304 ranks [OpenMPI] >>> UCX Transport Changes (environment variable) >>> (No recompilation; all runs done on same nodes) >>> Showing memory after ATM-TO-MED Step >>> [RSS Memory in MB] >>> >>> Standard Decomposition >>> UCX_TLS value ud default rc >>> Run 1 347.03 392.08 750.32 >>> Run 2 346.96 391.86 748.39 >>> Run 3 346.89 392.18 750.23 >>> >>> I'd love a way to trace how much *MPI alone* is using, since here I'm >>> still measuring the *process's* RSS. My feeling is that if, for >>> example, I'm running on N nodes and have a 1GB dataset + (for the sake of >>> discussion) 100MB of MPI info, then at 2N, with good scaling of domain >>> memory, that's 500MB + 100MB, at 4N it's 250MB/100MB, and eventually, at >>> 16N, the MPI memory dominates. As a result, when we scale out, even with >>> perfect scaling of *domain* memory, at some point memory associated >>> with MPI will cause this curve to taper off, and potentially invert. But >>> I'm admittedly *way* out of date on how modern MPI implementations >>> allocate buffers. >>> >>> In short, any tips on ways to better characterize MPI memory use would >>> be *greatly* appreciated! If this is purely on the UCX (or other >>> transport) level, that's good to know too. >>> >>> Thanks, >>> - Brian >>> >>> >>>