Hi Thomas and Carter,
I opened up a PR for this to allow more specific comments on the
implementation:
https://github.com/openjdk/jdk/pull/11449
If this discussion leads to us not wanting to proceed with the change I
will withdraw the PR.
Some more comments below.
On 2022-12-01 08:26, Thomas Stüfe wrote:
Hi Carter, Stefan,
thank you, I think it is good to have this discussion, it is important.
Side note, the discussion steered away from my original question -
whether to instrument the JDK with NMT. I still would love to discuss
that, too.
Sorry for that :)
About opening NMT up for user consumption, that is of course possible.
But I think the bigger question is which data we want to open for user
consumption, and at what granularity. And what contracts do we enter
when we do this.
To me this is not so much opening it up, but just making it much simpler
to get the already available data (JFR instead of jcmd). I get your
point that when we make it easier it will likely get more visibility and
that could generate expectations. To me the contract on these events
should not be much harder than, for example, the contract we have on the
format of GC logs. So we should not be locked down by this.
NMT was originally a hotspot-dev-centric tool. It has a lot of
idiosyncrasies. Interpreting the results needs detailed knowledge about
hotspot memory management. Some examples:
- its reports are not consistent across JDK versions, not even across
different patch levels of the same JDK. So you cannot compare results,
say, between JDK11 and 17.
- before a certain version X (I believe JDK 11), the full thread stacks
were accounted for instead of just the in-use portion of the thread
stacks. I remember reading blogs about how thread stack consumption went
down when all that changed was NMT reporting.
- The memory sizes it shows may not have much to do with real RSS. It
systematically underreports some things, since it omits libc overhead
and retention, usage by system- and JNI libraries. But it also
overreports things since it mostly (not always) accounts in terms of
"committed" memory, which usually means mmap()ed or malloc()ed memory.
But that is just committed, not physical memory, it does not translate
to RSS usage directly. That memory may never be touched. OTOH NMT probes
thread stacks with mincore(), so for that section, "committed" really
means "physical".
I agree that NMT is a low-level tool and that it's not perfect. But in
some cases I think it's the best way to see the memory consumption of
the JVM. Especially since you can zoom in on certain areas.
I am fine with opening up NMT via JFR. But does this mean we have to be
more consistent? Do we have to care about downward compatibility of NMT
reports? Are we then still free to redesign the tag system (see my
original mail) or will this tie us down with the current NMT tag system
forever? As a negative example, JFR exposes metaspace allocator details
(chunk statistics) which have been broken ever since JDK 16 when the
underlying implementation changed.
I think a tag based system for NMT would be awesome and it would be
really sad if exposing the NMT information through JFR would stop us
from doing this. Hopefully the only thing we need to do when improving
NMT is to do CSRs. One possible way to avoid constraints even more would
be to tag those events as "experimental" at first. This would signal
that user should not rely on them.
Therefore I am curious about what end users use NMT really for.
@Carter: can you give us examples of which NMT sections had been
particularly useful to you? Maybe we can define a subset to expose
instead of exposing all tags. E.g. I can see thread stack usage being
very useful, but things like ObjectMonitor footprint not so much.
I agree that not to many users would care about the ObjectMonitor
footprint, but unless we get constrained by what we report I would like
to report all. If there are constraints, this might be a good middle road.
Thanks,
Stefan
Cheers, Thomas
On Wed, Nov 30, 2022 at 9:45 PM Carter Kozak <cko...@ckozak.net
<mailto:cko...@ckozak.net>> wrote:
__
This looks fantastic, thank you so much! I can confirm that the
proposed
design would solve my use-case.
I'd enjoy discussing the NMT event contract somewhere more specific
to the implementation, but I don't want to muddle this thread with
implementation details.
Carter Kozak
On Wed, Nov 30, 2022, at 03:37, Stefan Johansson wrote:
Hi Carter,
Your mail made me pick up an old item from my wishlist: to have
native
memory tracking information available in JFR recordings. When we,
in GC,
do improvements to decrease the native memory overhead of our
algorithms, NMT is a very good tool to track the progress. We have
scripts that sound very similar to what you describe and more than
once
I've been thinking about adding this information into JFR. But it has
not been a priority and the greater value has been unclear.
Hearing that others might also benefit from such a change I took a
discussion with the JFR team on how to best proceed with this. I have
created a branch for this and will probably create a PR for it
shortly,
but I thought I would drop it here first:
https://github.com/kstefanj/jdk/tree/8157023-jfr-events-for-nmt
<https://urldefense.com/v3/__https://github.com/kstefanj/jdk/tree/8157023-jfr-events-for-nmt__;!!ACWV5N9M2RV99hQ!IpI1Gbn4N8zH6ZeK20WzMC2bG8XfncJ3sH15GZk2mG3AozRbI4h6b1ZtAhWMNr4qsHE1_dLeDFZWtzF6LpA4XQ4zFFGN$>
The change adds two new JFR events: one for the total usage and
one for
the usage of each memory type. These are sent only if Native Memory
Tracking is turned on, and they are enabled in the default JFR
profile
with an interval of 1s. This might change during reviewing but it
was a
good starting point.
With this you will be able to use JFR streaming to access the events
from within your running process. I hope this will help your use
cases
and please let us know if you have any comments or suggestions.
Thanks,
Stefan