I’ve edited the text of this JEP in JBS to tighten the prose, correct terminology, and fix a few formatting issues. HTML and diffs attached for reference. Please review and make any necessary corrections, then I’ll move it to Candidate.
- MarkTitle: JEP TBD: Asynchronous Stack Trace VM API
JEP TBD: Asynchronous Stack Trace VM API
Summary
Define an efficient and reliable API to collect stack traces asynchronously and include information on both Java and native stack frames.
Goals
-
Provide a well-tested API for profilers to obtain information on Java and native frames.
-
Support asynchronous usage, e.g., calling from signal handlers.
-
Do not affect performance when the API is not in use.
-
Do not significantly increase memory requirements compared to the existing
AsyncGetCallTrace
API.
Non-Goals
- It is not a goal to recommend the new API for production use, since it can crash the VM. We will minimize the chances of that via extensive testing and fuzzing.
Motivation
The AsyncGetCallTrace
API is used by almost all available profilers, both open-source and commercial, including, e.g., async-profiler. Yet it has two major disadvantages:
- It is an internal API, not exported in any header, and
- It only returns information about Java frames, namely their method and bytecode indices.
These issues make implementing profilers and related tooling more difficult. Some additional information can be extracted from the HotSpot VM via complex code, but other useful information is hidden and impossible to obtain:
- Whether a compiled Java frame is inlined (currently only obtainable for the topmost compiled frames),
- The compilation level of a Java frame (i.e., compiled by C1 or C2), and
- Information on C/C++ frames that are not at the top of the stack.
Such data can be helpful when profiling and tuning a VM for a given application, and for profiling code that uses JNI heavily.
Description
We propose a new AsyncGetStackTrace
API, modeled on the AsyncGetCallTrace
API:
void AsyncGetStackTrace(CallTrace *trace, jint depth, void* ucontext,
uint32_t options);
This API can be called by profilers to obtain the stack trace for the current thread. Calling this API from a signal handler is safe, and the new implementation will be at least as stable as
AsyncGetCallTrace
or the JFR stack walking code. The VM fills in information about the frames and the number of frames. The caller of the API should allocate the CallTrace
array with sufficient memory for the requested stack depth.
Parameters:
trace
— buffer for structured data to be filled in by the VMdepth
— maximum depth of the call stack traceucontext
— optionalucontext_t
of the current thread when it was interruptedoptions
— bit set for options
Currently only the lowest bit of the options
is considered: It enables (1
) or disables (0
) the inclusion of C/C++ frames. All other bits are considered to be 0
.
The trace
struct
typedef struct {
jint num_frames; // number of frames in this trace
CallFrame *frames; // frames
void* frame_info; // more information on frames
} CallTrace;
is filled in by the VM. Its num_frames
field contains the actual number of frames in the frames
array or an error code. The frame_info
field in that structure can later be used to store more information, but is currently NULL
.
The error codes are a subset of the error codes for AsyncGetCallTrace
, with the addition of THREAD_NOT_JAVA
related to calling this procedure for non-Java threads:
enum Error {
NO_JAVA_FRAME = 0,
NO_CLASS_LOAD = -1,
GC_ACTIVE = -2,
UNKNOWN_NOT_JAVA = -3,
NOT_WALKABLE_NOT_JAVA = -4,
UNKNOWN_JAVA = -5,
UNKNOWN_STATE = -7,
THREAD_EXIT = -8,
DEOPT = -9,
THREAD_NOT_JAVA = -10
};
Every CallFrame
is the element of a union, since the information stored for Java and non-Java frames differs:
typedef union {
FrameTypeId type; // to distinguish between JavaFrame and NonJavaFrame
JavaFrame java_frame;
NonJavaFrame non_java_frame;
} CallFrame;
There a several distinguishable frame types:
enum FrameTypeId : uint8_t {
FRAME_JAVA = 1, // JIT compiled and interpreted
FRAME_JAVA_INLINED = 2, // inlined JIT compiled
FRAME_NATIVE = 3, // native wrapper to call C methods from Java
FRAME_STUB = 4, // VM generated stubs
FRAME_CPP = 5 // C/C++/... frames
};
The first two types are for Java frames, for which we store the following information in a struct of type JavaFrame
:
typedef struct {
FrameTypeId type; // frame type
int8_t comp_level; // compilation level, 0 is interpreted
uint16_t bci; // 0 < bci < 65536
jmethodID method_id;
} JavaFrame; // used for FRAME_JAVA, FRAME_JAVA_INLINED and FRAME_NATIVE
The comp_level
indicates the compilation level of the method related to the frame, with higher numbers representing higher levels of compilation. It is modeled after the CompLevel
enum in HotSpot but is dependent on the compiler infrastructure used. A value of zero indicates no compilation, i.e., bytecode interpretation.
Information on all other frames is stored in NonJavaFrame
structs:
typedef struct {
FrameTypeId type; // frame type
void *pc; // current program counter inside this frame
} NonJavaFrame;
Although the API provides more information, the amount of space required per frame (e.g., 16 bytes on x86) is the same as for the existing AsyncGetCallTrace
API.
We propose to place these declarations in a new static header file, profile.h
. In the source tree it could be located in src/java.base/share/native/include
; in a delivered JDK bundle it should be copied into the include
directory. The header’s license must include the Classpath Exception so that it is consumable by third-party profiling tools.
A prototype implementation can be found here, and a demo combining it with a modified async-profiler can be found here.
Risks and Assumptions
Returning information on C/C++ frames leaks implementation details, but this is also true for the Java frames of AsyncGetCallTrace
since they leak details of the implementation of standard library files and include native wrapper frames.
Testing
We will add new stress tests to identify stability problems on all supported platforms. We plan to profile a set of example programs (e.g., the DaCapo and Renaissance benchmark suites) repeatedly with small profiling intervals (<= 0.1ms). We will also add substantial unit tests which should cover all options and test the basic usage of the API.
JEP TBD: Asynchronous Stack Trace VM API
Summary
Define an efficient and reliable API forto asynchronouscollect stack traces withasynchronously and include information on both Java and native stack frames.
Goals
-
Provide an official anda well-tested API for external profilers to obtain information on Java and native frames.
-
Support asynchronous usage, e.g., calling the API from signal handlers.
- The implementation does
Do not affect performance when the performance of an JVM whichAPI is not profiledin use.
- Memory
Do requirements for the collected data don'tnot significantly increase memory requirements compared to the existing
AsyncGetCallTrace
routineAPI.
Non-Goals
- TheIt is not a goal to recommend the new API shall not be recommended for production usageuse, assince thereit iscan acrash minimalthe chanceVM. We will minimize the chances of crashingthat the JVM, but we minimize by addressing all issues found duringvia extensive testing and fuzzing.
Motivation
The AsyncGetCallTrace
routineAPI hasis seenused increasing use in recent years in profilers like async-profiler withby almost all available profilers, both open-source and commercial, usingincluding, ite.g., async-profiler. ButYet it has two major disadvantages:
- It is only an internal API, as it is not exported in any header, and
- It theonly returns information onabout Java frames, itnamely returns is pretty limited: Only thetheir method and bytebytecode codeindices.
These index for Java frames is captured. Bothissues make implementing profilers and related tooling hardermore difficult. ToolsSome like async-profiler have to resort to complicated code to at least partially obtainadditional information thatcan be extracted from the JVMHotSpot alreadyVM has.via Informationcomplex thatcode, but other useful information is currently hidden and impossible to get isobtain:
- whetherWhether a compiled Java frame is inlined which is (currently only obtainable for the topmost compiled frames),
- theThe compilation level of a Java frame (i.e.g., compiled by C1 or C2), compiled)and
- Information on C/C++ frames that are not at the top of the stack.
Such data can be helpful when profiling and tuning a VM for a given application, and also for profiling code that uses JNI heavily.
Description
ThisWe JEPpropose proposesa annew AsyncGetStackTrace
API which is, modeled afteron AsyncGetCallTrace:the AsyncGetCallTrace
API:
void AsyncGetStackTrace(CallTrace *trace, jint depth, void* ucontext,
uint32_t options);
This API can be called by profilers to obtain the callstack trace for the current thread. Calling this API from a signal- handler is safe, and the new implementation will be at least as stable as AsyncGetCallTrace
or the JFR stack walking code. The VM fills in information about the frames and the number of frames. The caller of the API should allocate the CallTrace
structurearray with enoughsufficient memory for the requested stack depth.
Arguments:Parameters:
trace
: — buffer for structured data to be filled in by the JVMVMdepth
: — maximum depth of the call stack traceucontext
: — optionalucontext_t
of the current thread when it was interruptedoptions
: — bit set for options, currently
Currently only the lowest bit of the options
is considered,considered: itIt enables (1
) andor disables (0
) the inclusion of C/C++ frames,. allAll other bits are considered to be 0
.
The trace
struct
typedef struct {
jint num_frames; // number of frames in this trace
CallFrame *frames; // frames
void* frame_info; // more information on frames
} CallTrace;
is filled in by the VM. Its num_frames
field contains the actual number of frames in the frames
array or an error code. The frame_info
field in that structure can later be used to store more information, but is currently supposed to be NULL
.
The error codes are a subset of the error codes for AsyncGetCallTrace
, with the addition of THREAD_NOT_JAVA
related to calling this procedure for non-Java threads:
enum Error {
NO_JAVA_FRAME = 0,
NO_CLASS_LOAD = -1,
GC_ACTIVE = -2,
UNKNOWN_NOT_JAVA = -3,
NOT_WALKABLE_NOT_JAVA = -4,
UNKNOWN_JAVA = -5,
UNKNOWN_STATE = -7,
THREAD_EXIT = -8,
DEOPT = -9,
THREAD_NOT_JAVA = -10
};
Every CallFrame
is the element of a union, assince the information stored for Java and non-Java frames differs:
typedef union {
FrameTypeId type; // to distinguish between JavaFrame and NonJavaFrame
JavaFrame java_frame;
NonJavaFrame non_java_frame;
} CallFrame;
There a several distinguishable frame types:
enum FrameTypeId : uint8_t {
FRAME_JAVA = 1, // JIT compiled and interpreted
FRAME_JAVA_INLINED = 2, // inlined JIT compiled
FRAME_NATIVE = 3, // native wrapper to call C methods from Java
FRAME_STUB = 4, // VM generated stubs
FRAME_CPP = 5 // C/C++/... frames
};
The first two types are for Java frames, for which we store the following information in a struct of type JavaFrame
:
typedef struct {
FrameTypeId type; // frame type
int8_t comp_level; // compilation level, 0 is interpreted
uint16_t bci; // 0 < bci < 65536
jmethodID method_id;
} JavaFrame; // used for FRAME_JAVA, FRAME_JAVA_INLINED and FRAME_NATIVE
The comp_level
statesindicates the compilation level of the method related to the frame, with higher numbers representing "more"higher levels of compilation. 0 is defined as interpreted. It is modeled after the CompLevel
enum in compiler/compilerDefinitionsHotSpot but is dependent on the used compiler infrastructure used. A value of zero indicates no compilation, i.e., bytecode interpretation.
Information on all other frames is stored in the NonJavaFrame
struct:structs:
typedef struct {
FrameTypeId type; // frame type
void *pc; // current program counter inside this frame
} NonJavaFrame;
Although the API provides more information on the frames, the amount of space required per frame (e.g., 16 bytes on x86) is the same as for the originalexisting AsyncGetCallTrace
API.
We propose to createplace the above declarations in a new static header file, profile.h
. In the source tree it could be located in src/java.base/share/native/include and
; in a delivered JDK bundleimage it should be containedcopied ininto the include
folderdirectory. The headerheader’s needslicense toshould be provided underinclude the "Classpath"Classpath exceptionException toso makethat it is consumable forby 3rd third-party profiling tools.
A prototype implementation can be found https://github.com/parttimenerd/jdk/tree/parttimenerd_asgct2here, and a demo combining it with a demomodified atasync-profiler https://github.com/parttimenerd/asgct2-demo/. Alternatives Keepcan AsyncGetCallTracebe asfound is, meaning a lack of maintenance and stability for a widely used de-facto APIhere.
Risks and Assumptions
Returning information on C/C++ frames leaks implementation details, but this is also true for the Java frames of AsyncGetCallTrace
assince they leak details of the implementation of standard library files and include native wrapper frames.
Testing
The implementation of this JEPWe will add new stress tests to find rareidentify stability problems on all supported platforms. TheWe idea isplan to run the profiling onprofile a set of example programs (for examplee.g., the dacapoDaCapo and renaissanceRenaissance benchmark suites) repeatedly with small profiling intervals (<= 0.1ms). A prototype implementation can be found at https://github.com/parttimenerd/jdk-profiling-tester. The implementationWe will also add substantial JTREGunit tests which should cover all options and test the basic usage of the API.
A prototypical implementation can be found at https://github.com/openjdk/jdk-sandbox/tree/asgct2 and a demo combining it with a modified async-profiler can be found at https://github.com/parttimenerd/asgct2-demo