Re: Extend Native Memory Tracking over the JDK ? (was: Proposal: track zlib native memory usage with NMT)

Thomas Stüfe Mon, 05 Dec 2022 04:44:18 -0800

Thank you for the positive encouragement, Roman :-)

Cheers, Thomas


On Mon, Dec 5, 2022 at 12:03 PM Kennke, Roman <rken...@amazon.de> wrote:

> Hi Thomas,
>
> I very much like the idea and also your proposals how to do it. Insights
> in JDK's native memory usage is sorely lacking and would be very useful!
> I don't have all that much to add about the details beyond what you
> already covered, though :-)
>
> Cheers,
> Roman
>
>
> > Are there any opinions about whether or not to extend NMT across the JDK?
> >
> > This blocks https://bugs.openjdk.org/browse/JDK-8296360
> > <https://bugs.openjdk.org/browse/JDK-8296360>, and I had a PR prepared
> > as https://github.com/openjdk/jdk/pull/10988
> > <https://github.com/openjdk/jdk/pull/10988>. Originally I was hoping to
> > get this into JDK 20, but I don't think that is realistic anymore. I am
> > fine with postponing my work in favor of a baseline discussion, but so
> > far there is very little discussion about this topic.
> >
> > How should I proceed?
> >
> > Thanks, Thomas
> >
> >
> >
> > On Wed, Nov 9, 2022 at 8:12 AM Thomas Stüfe <thomas.stu...@gmail.com
> > <mailto:thomas.stu...@gmail.com>> wrote:
> >
> >     Hi Alan,
> >
> >     (replaced hotspot-runtime-dev with hotspot-dev, since its more of a
> >     general topic)
> >
> >     thank you for your time!
> >
> >     I am very happy to talk this through. I think native memory
> >     observability in the JDK (and customer code!) is sorely lacking.
> >     Witness the countless "where did my native memory go" blog articles.
> >     At SAP we have been struggling with this topic for a long time and
> >     have come up with a mixture of solutions. The aforementioned tracker
> >     was one, which extended our version of NMT across the JDK. Our
> >     SapMachine MallocTracer, which allows us to trace uninstrumented
> >     customer code, another. We even experimented with exchanging the
> >     allocator (using jemalloc) to gain insights. But that is a whole
> >     different topic with deep logistical implications, I don't want to
> >     touch it here. Exchanging the allocator does not help to observe
> >     virtual memory or the brk segment, of course.
> >
> >     And to make the picture complete, another insight we currently lack
> >     is the implicit allocator overhead, which can be very significant
> >     and is hidden by the libc. We also have observability for that in
> >     the SapMachine, and I miss it in OpenJDK.
> >
> >     As you noticed, my original intent was just to instrument Zlib and
> >     possibly improve tracking for DBBs. Although, thinking beyond that,
> >     another attractive instrumentation target would be mapped NIO
> >     buffers at least.
> >
> >     So I think native memory observability is important. Arguably we
> >     could even extend observability to cover other OS resources, e.g.
> >     file handles. If we shift code around, to java/Panama: data that
> >     move the java heap does not need to be tracked, but other memory
> >     will always come from one of the basic system APIs, regardless of
> >     who allocates it and where in the stack allocation happens. Be it
> >     native JDK code, Panama, or even customer JNI code.
> >
> >     If we agree on the importance of native memory observability, then I
> >     believe NMT is the right tool for it. It is a good tool. The
> >     machinery is already there. It covers both C-heap and virtual memory
> >     APIs, as well as thread stacks, and could easily be extended to
> >     cover sbrk if needed. And I assume that whatever shape OpenJDK takes
> >     on in the future, there always will be a libjvm.so at its core, so
> >     we will always have it. But even if not, NMT could be separated from
> >     libjvm.so quite easily, since it has no deep ties with the JVM.
> >
> >     About coupling JVM with outside code: We don't have to directly link
> >     against libjvm.so. We can keep things loose if the intent is to be
> >     runnable without a JVM, or be JVM-version-agnostic. That could take
> >     the form of a function-pointer interface like JVMTI. Or outside code
> >     could dynamically dlsym the JVM allocation hooks. In any case
> >     gracefully falling back to system allocation routines when necessary.
> >
> >     And I agree, polluting the NMT tag space with outside meaning is
> >     ugly. I only did it because I planned to go no further than
> >     instrumenting Zlib and possibly DBBs. But if we take this further,
> >     my preferred solution would be a reserved tag range or -ranges for
> >     outside use, whose inner meaning would be opaque to the JVM. Kind of
> >     like SIGRTMIN+SIGRTMAX. Then, outside code could register tags and
> >     their meta information with the JVM, or we find a different way to
> >     convey the tag meaning to NMT (config files, or callbacks). That
> >     could even be opened up for customer use.
> >
> >     This also touches on another question, that of NMT tag space. NMT
> >     tags are very useful since they allow cheap tracking without
> >     capturing call stacks. However, tags are underused and show growing
> >     pains since they are too one-dimensional and restrictive. We had
> >     competing interests in the past about tag granularity. It is all
> >     over the place. We have coarse-grained tags like "mtThread", and
> >     very fine-grained ones like "mtObjectMonitor". There are several
> >     ways we could improve, e.g., by making them combinable like UL does,
> >     or allowing for a hierarchy of them - either a hard-wired limited
> >     one like "domain"+"tag", or an unlimited tree-like one. Technically
> >     interesting since whatever the new encoding is, they still must fit
> >     into a malloc header. I opened
> >     https://bugs.openjdk.org/browse/JDK-8281819
> >     <https://bugs.openjdk.org/browse/JDK-8281819> to track ideas like
> these.
> >
> >     Instrumenting Panama allocations, including the ability to tag
> >     allocations, would be a very good idea. For instance, if we ever
> >     remove the native Zlib layer and convert it to java using Panama, we
> >     can do the same with Panama I do now natively - use the Zlib zalloc
> >     interface to hook in JVM memory allocation functions. The result
> >     could be completely identical, and the end user looking at the NMT
> >     output need never know that anything changed.
> >
> >     And that goes for all instrumentation - if today we add it to JNI
> >     code, and that code gets removed tomorrow, we can add it to Panama
> >     code too. Unless data structures move to the heap, in which case
> >     there is no need to track them.
> >
> >     You mentioned that NMT was more of an in-house support tool. Our
> >     experience is different. Even though it was positioned as a tool for
> >     JVM developers, and we never cared for the backward compatibility or
> >     consistency, it gets used a *lot* by our customers. We have to
> >     explain its output frequently. Also, many blog articles exist
> >     documenting its use. So, maybe it would be okay to elevate it to a
> >     user-facing tool since it seems to occupy that role anyway. We may
> >     also open up consumption of NMT results via java APIs, or expose its
> >     results via MXBeans.
> >
> >     If this is to be a JEP, okay, but I'm afraid it would stall things a
> >     bit. I am interested in getting a simpler and quicker solution for
> >     older support releases at least, possibly based on my PR. I know
> >     that would be unconventional though.
> >
> >     Thank you,
> >
> >     Thomas
> >
> >
> >     On Sun, Nov 6, 2022 at 9:31 AM Alan Bateman <alan.bate...@oracle.com
> >     <mailto:alan.bate...@oracle.com>> wrote:
> >
> >         On 04/11/2022 16:54, Thomas Stüfe wrote:
> >          > Hi all,
> >          >
> >          > I am currently working on
> >         https://bugs.openjdk.org/browse/JDK-8296360
> >         <https://bugs.openjdk.org/browse/JDK-8296360>;
> >          > I was preparing the final PR [1], but then Alan did ask me to
> >         discuss
> >          > this on core-libs first.
> >          >
> >          > Backstory:
> >          >
> >          > NMT tracks hotspot native allocations but does not cover the
> JDK
> >          > libraries (small exception: Unsafe.AllocateMemory). However,
> the
> >          > native memory footprint of JDK libraries can be significant.
> >         We have
> >          > no in-VM tracker for these and need tools like valgrind or our
> >          > SapMachine MallocTracer [2] to observe them.
> >
> >         Thanks for starting a discussion on this as this is a topic that
> >         requires agreement from several areas. If this is the start of
> >         something
> >         bigger, where you want to have all allocation sites in the
> >         libraries
> >         using NMT, then I think it needs a write-up, maybe a JEP.
> >
> >         For starters, I think it needs some agreement on using NMT for
> >         memory
> >         allocated outside of libjvm. You mentioned Unsafe as an
> >         exception but
> >         that is implemented in the VM so you get tracking for free,
> >         albeit I
> >         think all allocations are in the "mtOther" category.
> >
> >         A general concern is that it creates more coupling between the
> >         VM code
> >         and the libraries code. As you probably know, we've removed most
> >         of the
> >         dependences on JVM_* functions from non-core areas over many
> >         years. So I
> >         think that needs consideration as I assume we don't want
> >         memory/allocation.hpp declaring a dozen catagories for
> >         allocations done
> >         in say java.desktop module for example. Maybe your proposal will
> be
> >         strictly limited to java.base but even then, do we really want
> >         the VM
> >         even knowing about categories that are specific to zip
> >         compression or
> >         decompression?
> >
> >         There are probably longer term trends that should be part of the
> >         discussion too. One general trend is that "run time" is becoming
> >         more
> >         and more a hybrid of code in libvm and the Java libraries.
> Lambdas,
> >         module system, virtual threads implementations are a few
> >         examples in the
> >         last few release. This comes with many "Java on Java" challenges,
> >         including serviceability where users of the platform will expect
> >         tools
> >         to just work and won't care where the code is. NMT is probably
> >         more for
> >         support teams and not something that most developers will ever
> >         use but I
> >         think is part of the challenge of having serviceability
> >         solutions "just
> >         work".
> >
> >         In addition to having more of the Java runtime written in Java,
> >         there
> >         will likely be less JNI code in the future. It's very possible
> >         that the
> >         JNI code (including the JNI methods in libzip) will be replaced
> >         with
> >         code that uses Panama memory and linker APIs once they are become
> >         permanent. The effect of that would to have a lot of the memory
> >         allocations be tracked in the mtOther category again. Maybe
> >         integration
> >         with memory tracking should be looked at in conjunction with
> >         these APIs
> >         and this migration. I could imagine the proposed "Arena" API
> >         (MemorySession in Java 19) having some integration with NMT and
> >         it might
> >         be interesting to look into that.
> >
> >         So yes, this topic does need broader discussion and it might be
> >         a bit
> >         premature to start with a PR for libzip without talking about
> >         the bigger
> >         picture first.
> >
> >         -Alan
> >
> >
> >
>
>
>
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
>
>
>

Re: Extend Native Memory Tracking over the JDK ? (was: Proposal: track zlib native memory usage with NMT)

Reply via email to