On 1/11/2023 6:52 PM, Yi Yang wrote:
Hi Ioi,
> I think there are overlaps between your proposal and existing tools. For
example, there are jcmd options such as VM.class_hierarchy and
VM.classes, etc.
> The Serviceability Agent can also be used to analyze the contents of
the class metadata.
Of course, we can continue to add jcmd commands such as jcmd
VM.method_counter and jcmd VM.aggregtate_by_class_package to help
diagnosing, but another once and for all solution is to implement a
rich and well-formed metadata dump as this proposal described,
third-party parsers and platforms are eligible to analyze well-formed
dump file and provide many grouping/filtering
options(grouping_by_package, filter_linked, filter_force_inline,
essentially VM.class_hierarchy is aggregation of VM.classes).
I'm trying to describe a real use case to illustrate benefits of
well-formed metaspace dump: In our internal DevOps platform, I
observed that the Metaspace utilization rate of my application has
been high. During this period, FGC occurred several times. So I
generate a well-formed metaspace dump through DevOps platform, and
then the dump file will be automatically generated and uploaded to
another internal Java troubleshooting platform, troubleshooting
platform further analyzes and show it with many grouping and filter
options and so on.
> I'd be interested in seeing your implementation and compare it with
the existing tools.
I'm starting to do this, and it may take several months to implement
since it looks more like a JEP level feature, I want to hear some
general discussion before coding, i.e, is it acceptable to use JSON
format? should it be Metadata Dump or keeping the current metaspace
scope? Do you think basic+extend output for internal structure is
acceptable?
Before discussing the output of this tool, I think it's better to first
discuss the goals and intended use
- For Java app developers, I am not sure if they care about the
representation of the classes inside HotSpot. They may want to know what
classes are loaded in what class loaders, or want to trouble shoot
memory leaks (why aren't my classes unloaded, etc). For these, we
already have existing tools.
- For HotSpot developers, it would be nice to have a dump of all the
metadata, but I am not sure how important this is, as people seem to be
able to get by with their own debugging methods.
By the way, there may be multiple ways of creating such a dump. The
least intrusive way would be to program the Serviceability Agent, which
already has a lot of Java APIs to access HotSpot internals. That way,
you can write the dumper without modifying the HotSpot C++ code. It
could even be maintained as a project outside of the JDK repo.
Also you mentioned that "Internally we implemented a metaspace dump that
generates human-readable text". Can you share how this tool was implemented?
Thanks
- Ioi
> This may be quite difficult, because the metadata contains rewritten
Java bytecodes. The rewriting format may be dependent on the JDK
version. Also, the class linkage (the resolution of constant pool
information) will be vastly from one JDK version to another. So using
writing a third party tool that can work with multiple JDK versions
will be quite hard.
Thanks for your input! Maybe display rewrited bytecodes? Anyway, I'll
take a close look at this, and I'll prepare a POC along with dump
parser and a simple UI diagnose web once ready.
> Also, defining a "portable" format for the dump will be difficult,
since we don't know how the internal data structure will evolve in the
future.
Yes, since we don't know how internal data structure will changed in
the future, so I propose reaching a consensus that we can at least
reconstruct Java (rewrited?) source code as much as possible. For
example, the dumped JSON object for InstanceKlass contains two parts,
the first part contains the necessary information to reconstruct the
source code as much as possible, and the second part is extended
information, like this:
{
name:..,
super:..,
flags:...,
method:[]
interface:[]
fields:[],
annotation:[]
bytecode:[],
constantpool:[],
//extend
init_state:...,
init_thread:...,
}
The first part is basically unchanged(or adding new fields only), and
the extended part is subject to change, visualization dump client
checks if fields of JSON objects are defined and displays them further.
------------------------------------------------------------------
From:Ioi Lam <ioi....@oracle.com>
Send Time:2023 Jan. 12 (Thu.) 08:15
To:hotspot-runtime-dev <hotspot-runtime-...@openjdk.org>;
serviceability-...@openjdk.java.net
<serviceability-...@openjdk.java.net>
Subject:Re: RFC: regarding metaspace(metadata?) dump
CC-ing serviceability.
Hi Yi,
In general, I think it's good to have tools for understanding the
internal layout of the class metadata layouts.
I think there are overlaps between your proposal and existing
tools. For example, there are jcmd options such as
VM.class_hierarchy and VM.classes, etc.
The Serviceability Agent can also be used to analyze the contents
of the class metadata.
Dd you look at the existing tools and see how they match up with
your requirements?
I'd be interested in seeing your implementation and compare it
with the existing tools.
On 1/11/2023 4:56 AM, Yi Yang wrote:
Hi,
Internally, we often receive feedback from users and ask for help
on metaspace-related issues, for example
1. Users are eager to know which GroovyClassLoader loads which
classes, why they are not unloaded,
and why they are leading to Metaspace OOME.
2. They want to know the class structure of dynamically generated
classes in some scenarios such as
deserialization
3. Finding memory leaking about duplicated classes
...
Internally we implemented a metaspace dump that generates
human-readable text, it looks something like this:
[Basic Information]
Dump Reason : JCMD
MaxMetaspaceSize : 18446744073709547520 B
CompressedClassSpaceSize : 1073741824 B
Class Space Used : 309992 B
Class Space Capacity : 395264 B
...
[Class Loader Data]
ClassLoaderData : loader = 0x000000008024f928, loader_klass =
0x0000000800010098, loader_klass_name =
sun/misc/Launcher$AppClassLoader, label = N/A
Class Used Chunks :
* Chunk : [0x0000000800060000, 0x0000000800060230,
0x0000000800060800)
NonClass Used Chunks :
* Chunk : [0x00007fd8379c1000, 0x00007fd8379c1350,
0x00007fd8379c2000)
Klasses :
Klass : 0x0000000800060028, name = Test, size = 520 B
ConstantPool : 0x00007fd8379c1050, size = 296 B
...
It has been working effectively for several years and has helped
many users solve metaspace-related problems.
But a more user-friendly way is that JDK can inherently support
this capability. We hope that format of the metaspace
dump file can take both flexibility and compatibility into
account, and the content of dump file should be detailed
enough to meet the needs of both application developers and
lower-level developers.
Based on above considerations, I think using JSON as its file
format is an appropriate solution(But XML or binary
format are still not excluded as candidates). Specifically, in
earlier thoughts, I thought the format of the metaspace
file could be as follows(pretty printed)
https://gist.github.com/y1yang0/ab3034b6381b8a9d215602c89af4e9c3
Using the JSON format, we can flexibly add new fields without
breaking compatibility. It is debatable as to which data
to write. We can reach a consensus that third-party
parsers(Metaspace Analyzer Tool) can at least reconstruct Java
source code from the dump file.
This may be quite difficult, because the metadata contains
rewritten Java bytecodes. The rewriting format may be dependent on
the JDK version. Also, the class linkage (the resolution of
constant pool information) will be vastly from one JDK version to
another. So using writing a third party tool that can work with
multiple JDK versions will be quite hard. Also, defining a
"portable" format for the dump will be difficult, since we don't
know how the internal data structure will evolve in the future.
Thanks
- Ioi
Based on this, we can write more useful information for low-level
troubleshooting
or debugging. (e.g. the init_state of InstanceKlass).
In addition, we can even output the native code and associated
information with regard to Method, third-party parser
can reconstruct the human-readable assembly representation of the
compiled method based on dump file. To some extent,
we have implemented code cache dump by the way. For this reason,
I'm not sure if the title of the RFC proposal should
be called metaspace dump, maybe metadata dump? It looks more like
a metadata-dump framework.
Do you have any thoughts about metaspace/metadata dump? Looking
forward to hearing your feedback, any comments are invaluable!
Best regards,
Yi Yang