Hi Thomas and Carter,

I opened up a PR for this to allow more specific comments on the implementation:
https://github.com/openjdk/jdk/pull/11449

If this discussion leads to us not wanting to proceed with the change I will withdraw the PR.

Some more comments below.

On 2022-12-01 08:26, Thomas Stüfe wrote:
Hi Carter, Stefan,

thank you, I think it is good to have this discussion, it is important.

Side note, the discussion steered away from my original question - whether to instrument the JDK with NMT. I still would love to discuss that, too.


Sorry for that :)

About opening NMT up for user consumption, that is of course possible. But I think the bigger question is which data we want to open for user consumption, and at what granularity. And what contracts do we enter when we do this.


To me this is not so much opening it up, but just making it much simpler to get the already available data (JFR instead of jcmd). I get your point that when we make it easier it will likely get more visibility and that could generate expectations. To me the contract on these events should not be much harder than, for example, the contract we have on the format of GC logs. So we should not be locked down by this.

NMT was originally a hotspot-dev-centric tool. It has a lot of idiosyncrasies. Interpreting the results needs detailed knowledge about hotspot memory management. Some examples:

- its reports are not consistent across JDK versions, not even across different patch levels of the same JDK. So you cannot compare results, say, between JDK11 and 17. - before a certain version X (I believe JDK 11), the full thread stacks were accounted for instead of just the in-use portion of the thread stacks. I remember reading blogs about how thread stack consumption went down when all that changed was NMT reporting. - The memory sizes it shows may not have much to do with real RSS. It systematically underreports some things, since it omits libc overhead and retention, usage by system- and JNI libraries. But it also overreports things since it mostly (not always) accounts in terms of "committed" memory, which usually means mmap()ed or malloc()ed memory. But that is just committed, not physical memory, it does not translate to RSS usage directly. That memory may never be touched. OTOH NMT probes thread stacks with mincore(), so for that section, "committed" really means "physical".


I agree that NMT is a low-level tool and that it's not perfect. But in some cases I think it's the best way to see the memory consumption of the JVM. Especially since you can zoom in on certain areas.

I am fine with opening up NMT via JFR. But does this mean we have to be more consistent? Do we have to care about downward compatibility of NMT reports? Are we then still free to redesign the tag system (see my original mail) or will this tie us down with the current NMT tag system forever? As a negative example, JFR exposes metaspace allocator details (chunk statistics) which have been broken ever since JDK 16 when the underlying implementation changed.


I think a tag based system for NMT would be awesome and it would be really sad if exposing the NMT information through JFR would stop us from doing this. Hopefully the only thing we need to do when improving NMT is to do CSRs. One possible way to avoid constraints even more would be to tag those events as "experimental" at first. This would signal that user should not rely on them.

Therefore I am curious about what end users use NMT really for.

@Carter: can you give us examples of which NMT sections had been particularly useful to you? Maybe we can define a subset to expose instead of exposing all tags. E.g. I can see thread stack usage being very useful, but things like ObjectMonitor footprint not so much.


I agree that not to many users would care about the ObjectMonitor footprint, but unless we get constrained by what we report I would like to report all. If there are constraints, this might be a good middle road.

Thanks,
Stefan

Cheers, Thomas




On Wed, Nov 30, 2022 at 9:45 PM Carter Kozak <cko...@ckozak.net <mailto:cko...@ckozak.net>> wrote:

    __
    This looks fantastic, thank you so much! I can confirm that the
    proposed
    design would solve my use-case.

    I'd enjoy discussing the NMT event  contract somewhere more specific
    to the implementation, but I don't want to muddle this thread with
    implementation details.

    Carter Kozak

    On Wed, Nov 30, 2022, at 03:37, Stefan Johansson wrote:
    Hi Carter,

    Your mail made me pick up an old item from my wishlist: to have
    native
    memory tracking information available in JFR recordings. When we,
    in GC,
    do improvements to decrease the native memory overhead of our
    algorithms, NMT is a very good tool to track the progress. We have
    scripts that sound very similar to what you describe and more than
    once
    I've been thinking about adding this information into JFR. But it has
    not been a priority and the greater value has been unclear.

    Hearing that others might also benefit from such a change I took a
    discussion with the JFR team on how to best proceed with this. I have
    created a branch for this and will probably create a PR for it
    shortly,
    but I thought I would drop it here first:
    https://github.com/kstefanj/jdk/tree/8157023-jfr-events-for-nmt
    
<https://urldefense.com/v3/__https://github.com/kstefanj/jdk/tree/8157023-jfr-events-for-nmt__;!!ACWV5N9M2RV99hQ!IpI1Gbn4N8zH6ZeK20WzMC2bG8XfncJ3sH15GZk2mG3AozRbI4h6b1ZtAhWMNr4qsHE1_dLeDFZWtzF6LpA4XQ4zFFGN$>

    The change adds two new JFR events: one for the total usage and
    one for
    the usage of each memory type. These are sent only if Native Memory
    Tracking is turned on, and they are enabled in the default JFR
    profile
    with an interval of 1s. This might change during reviewing but it
    was a
    good starting point.

    With this you will be able to use JFR streaming to access the events
    from within your running process. I hope this will help your use
    cases
    and please let us know if you have any comments or suggestions.

    Thanks,
    Stefan

Reply via email to