I’ve edited the text of this JEP in JBS to tighten the prose, correct
terminology, and fix a few formatting issues.  HTML and diffs attached
for reference.  Please review and make any necessary corrections, then
I’ll move it to Candidate.

- Mark
Title: JEP TBD: Asynchronous Stack Trace VM API

JEP TBD: Asynchronous Stack Trace VM API

Summary

Define an efficient and reliable API to collect stack traces asynchronously and include information on both Java and native stack frames.

Goals

  • Provide a well-tested API for profilers to obtain information on Java and native frames.

  • Support asynchronous usage, e.g., calling from signal handlers.

  • Do not affect performance when the API is not in use.

  • Do not significantly increase memory requirements compared to the existing AsyncGetCallTrace API.

Non-Goals

  • It is not a goal to recommend the new API for production use, since it can crash the VM. We will minimize the chances of that via extensive testing and fuzzing.

Motivation

The AsyncGetCallTrace API is used by almost all available profilers, both open-source and commercial, including, e.g., async-profiler. Yet it has two major disadvantages:

  • It is an internal API, not exported in any header, and
  • It only returns information about Java frames, namely their method and bytecode indices.

These issues make implementing profilers and related tooling more difficult. Some additional information can be extracted from the HotSpot VM via complex code, but other useful information is hidden and impossible to obtain:

  • Whether a compiled Java frame is inlined (currently only obtainable for the topmost compiled frames),
  • The compilation level of a Java frame (i.e., compiled by C1 or C2), and
  • Information on C/C++ frames that are not at the top of the stack.

Such data can be helpful when profiling and tuning a VM for a given application, and for profiling code that uses JNI heavily.

Description

We propose a new AsyncGetStackTrace API, modeled on the AsyncGetCallTrace API:

void AsyncGetStackTrace(CallTrace *trace, jint depth, void* ucontext,
                        uint32_t options);

This API can be called by profilers to obtain the stack trace for the current thread. Calling this API from a signal handler is safe, and the new implementation will be at least as stable as AsyncGetCallTrace or the JFR stack walking code. The VM fills in information about the frames and the number of frames. The caller of the API should allocate the CallTrace array with sufficient memory for the requested stack depth.

Parameters:

  • trace — buffer for structured data to be filled in by the VM
  • depth — maximum depth of the call stack trace
  • ucontext — optional ucontext_t of the current thread when it was interrupted
  • options — bit set for options

Currently only the lowest bit of the options is considered: It enables (1) or disables (0) the inclusion of C/C++ frames. All other bits are considered to be 0.

The trace struct

typedef struct {
  jint num_frames;                // number of frames in this trace
  CallFrame *frames;              // frames
  void* frame_info;               // more information on frames
} CallTrace;

is filled in by the VM. Its num_frames field contains the actual number of frames in the frames array or an error code. The frame_info field in that structure can later be used to store more information, but is currently NULL.

The error codes are a subset of the error codes for AsyncGetCallTrace, with the addition of THREAD_NOT_JAVA related to calling this procedure for non-Java threads:

enum Error {
  NO_JAVA_FRAME         =   0,
  NO_CLASS_LOAD         =  -1, 
  GC_ACTIVE             =  -2,    
  UNKNOWN_NOT_JAVA      =  -3,
  NOT_WALKABLE_NOT_JAVA =  -4,
  UNKNOWN_JAVA          =  -5,
  UNKNOWN_STATE         =  -7,
  THREAD_EXIT           =  -8,
  DEOPT                 =  -9,
  THREAD_NOT_JAVA       = -10
};

Every CallFrame is the element of a union, since the information stored for Java and non-Java frames differs:

typedef union {
  FrameTypeId type;     // to distinguish between JavaFrame and NonJavaFrame 
  JavaFrame java_frame;
  NonJavaFrame non_java_frame;
} CallFrame;

There a several distinguishable frame types:

enum FrameTypeId : uint8_t {
  FRAME_JAVA         = 1, // JIT compiled and interpreted
  FRAME_JAVA_INLINED = 2, // inlined JIT compiled
  FRAME_NATIVE       = 3, // native wrapper to call C methods from Java
  FRAME_STUB         = 4, // VM generated stubs
  FRAME_CPP          = 5  // C/C++/... frames
};

The first two types are for Java frames, for which we store the following information in a struct of type JavaFrame:

typedef struct {     
  FrameTypeId type;       // frame type
  int8_t comp_level;      // compilation level, 0 is interpreted
  uint16_t bci;           // 0 < bci < 65536
  jmethodID method_id;
} JavaFrame;              // used for FRAME_JAVA, FRAME_JAVA_INLINED and FRAME_NATIVE

The comp_level indicates the compilation level of the method related to the frame, with higher numbers representing higher levels of compilation. It is modeled after the CompLevel enum in HotSpot but is dependent on the compiler infrastructure used. A value of zero indicates no compilation, i.e., bytecode interpretation.

Information on all other frames is stored in NonJavaFrame structs:

typedef struct {
  FrameTypeId type;  // frame type
  void *pc;          // current program counter inside this frame
} NonJavaFrame;  

Although the API provides more information, the amount of space required per frame (e.g., 16 bytes on x86) is the same as for the existing AsyncGetCallTrace API.

We propose to place these declarations in a new static header file, profile.h. In the source tree it could be located in src/java.base/share/native/include; in a delivered JDK bundle it should be copied into the include directory. The header’s license must include the Classpath Exception so that it is consumable by third-party profiling tools.

A prototype implementation can be found here, and a demo combining it with a modified async-profiler can be found here.

Risks and Assumptions

Returning information on C/C++ frames leaks implementation details, but this is also true for the Java frames of AsyncGetCallTrace since they leak details of the implementation of standard library files and include native wrapper frames.

Testing

We will add new stress tests to identify stability problems on all supported platforms. We plan to profile a set of example programs (e.g., the DaCapo and Renaissance benchmark suites) repeatedly with small profiling intervals (<= 0.1ms). We will also add substantial unit tests which should cover all options and test the basic usage of the API.

Title: JEP TBD: Asynchronous Stack Trace VM API

JEP TBD: Asynchronous Stack Trace VM API

Summary

Define an efficient and reliable API forto asynchronouscollect stack traces withasynchronously and include information on both Java and native stack frames.

Goals

  • Provide an official anda well-tested API for external profilers to obtain information on Java and native frames.

  • Support asynchronous usage, e.g., calling the API from signal handlers.

  • The implementation does

    Do not affect performance when the performance of an JVM whichAPI is not profiledin use.

  • Memory

    Do requirements for the collected data don'tnot significantly increase memory requirements compared to the existing AsyncGetCallTrace routineAPI.

Non-Goals

  • TheIt is not a goal to recommend the new API shall not be recommended for production usageuse, assince thereit iscan acrash minimalthe chanceVM. We will minimize the chances of crashingthat the JVM, but we minimize by addressing all issues found duringvia extensive testing and fuzzing.

Motivation

The AsyncGetCallTrace routineAPI hasis seenused increasing use in recent years in profilers like async-profiler withby almost all available profilers, both open-source and commercial, usingincluding, ite.g., async-profiler. ButYet it has two major disadvantages:

  • It is only an internal API, as it is not exported in any header, and
  • It theonly returns information onabout Java frames, itnamely returns is pretty limited: Only thetheir method and bytebytecode codeindices.

These index for Java frames is captured. Bothissues make implementing profilers and related tooling hardermore difficult. ToolsSome like async-profiler have to resort to complicated code to at least partially obtainadditional information thatcan be extracted from the JVMHotSpot alreadyVM has.via Informationcomplex thatcode, but other useful information is currently hidden and impossible to get isobtain:

  • whetherWhether a compiled Java frame is inlined which is (currently only obtainable for the topmost compiled frames),
  • theThe compilation level of a Java frame (i.e.g., compiled by C1 or C2), compiled)and
  • Information on C/C++ frames that are not at the top of the stack.

Such data can be helpful when profiling and tuning a VM for a given application, and also for profiling code that uses JNI heavily.

Description

ThisWe JEPpropose proposesa annew AsyncGetStackTrace API which is, modeled afteron AsyncGetCallTrace:the AsyncGetCallTrace API:

void AsyncGetStackTrace(CallTrace *trace, jint depth, void* ucontext,
                        uint32_t options);

This API can be called by profilers to obtain the callstack trace for the current thread. Calling this API from a signal- handler is safe, and the new implementation will be at least as stable as AsyncGetCallTrace or the JFR stack walking code. The VM fills in information about the frames and the number of frames. The caller of the API should allocate the CallTrace structurearray with enoughsufficient memory for the requested stack depth.

Arguments:Parameters:

  • trace: buffer for structured data to be filled in by the JVMVM
  • depth: maximum depth of the call stack trace
  • ucontext: optional ucontext_t of the current thread when it was interrupted
  • options: bit set for options, currently

Currently only the lowest bit of the options is considered,considered: itIt enables (1) andor disables (0) the inclusion of C/C++ frames,. allAll other bits are considered to be 0 .

The trace struct

typedef struct {
  jint num_frames;                // number of frames in this trace
  CallFrame *frames;              // frames
  void* frame_info;               // more information on frames
} CallTrace;

is filled in by the VM. Its num_frames field contains the actual number of frames in the frames array or an error code. The frame_info field in that structure can later be used to store more information, but is currently supposed to be NULL.

The error codes are a subset of the error codes for AsyncGetCallTrace, with the addition of THREAD_NOT_JAVA related to calling this procedure for non-Java threads:

enum Error {
  NO_JAVA_FRAME         =   0,
  NO_CLASS_LOAD         =  -1, 
  GC_ACTIVE             =  -2,    
  UNKNOWN_NOT_JAVA      =  -3,
  NOT_WALKABLE_NOT_JAVA =  -4,
  UNKNOWN_JAVA          =  -5,
  UNKNOWN_STATE         =  -7,
  THREAD_EXIT           =  -8,
  DEOPT                 =  -9,
  THREAD_NOT_JAVA       = -10
};

Every CallFrame is the element of a union, assince the information stored for Java and non-Java frames differs:

typedef union {
  FrameTypeId type;     // to distinguish between JavaFrame and NonJavaFrame 
  JavaFrame java_frame;
  NonJavaFrame non_java_frame;
} CallFrame;

There a several distinguishable frame types:

enum FrameTypeId : uint8_t {
  FRAME_JAVA         = 1, // JIT compiled and interpreted
  FRAME_JAVA_INLINED = 2, // inlined JIT compiled
  FRAME_NATIVE       = 3, // native wrapper to call C methods from Java
  FRAME_STUB         = 4, // VM generated stubs
  FRAME_CPP          = 5  // C/C++/... frames
};

The first two types are for Java frames, for which we store the following information in a struct of type JavaFrame:

typedef struct {     
  FrameTypeId type;       // frame type
  int8_t comp_level;      // compilation level, 0 is interpreted
  uint16_t bci;           // 0 < bci < 65536
  jmethodID method_id;
} JavaFrame;              // used for FRAME_JAVA, FRAME_JAVA_INLINED and FRAME_NATIVE

The comp_level statesindicates the compilation level of the method related to the frame, with higher numbers representing "more"higher levels of compilation. 0 is defined as interpreted. It is modeled after the CompLevel enum in compiler/compilerDefinitionsHotSpot but is dependent on the used compiler infrastructure used. A value of zero indicates no compilation, i.e., bytecode interpretation.

Information on all other frames is stored in the NonJavaFrame struct:structs:

typedef struct {
  FrameTypeId type;  // frame type
  void *pc;          // current program counter inside this frame
} NonJavaFrame;  

Although the API provides more information on the frames, the amount of space required per frame (e.g., 16 bytes on x86) is the same as for the originalexisting AsyncGetCallTrace API.

We propose to createplace the above declarations in a new static header file, profile.h. In the source tree it could be located in src/java.base/share/native/include and; in a delivered JDK bundleimage it should be containedcopied ininto the include folderdirectory. The headerheader’s needslicense toshould be provided underinclude the "Classpath"Classpath exceptionException toso makethat it is consumable forby 3rd third-party profiling tools.

A prototype implementation can be found https://github.com/parttimenerd/jdk/tree/parttimenerd_asgct2here, and a demo combining it with a demomodified atasync-profiler https://github.com/parttimenerd/asgct2-demo/. Alternatives Keepcan AsyncGetCallTracebe asfound is, meaning a lack of maintenance and stability for a widely used de-facto APIhere.

Risks and Assumptions

Returning information on C/C++ frames leaks implementation details, but this is also true for the Java frames of AsyncGetCallTrace assince they leak details of the implementation of standard library files and include native wrapper frames.

Testing

The implementation of this JEPWe will add new stress tests to find rareidentify stability problems on all supported platforms. TheWe idea isplan to run the profiling onprofile a set of example programs (for examplee.g., the dacapoDaCapo and renaissanceRenaissance benchmark suites) repeatedly with small profiling intervals (<= 0.1ms). A prototype implementation can be found at https://github.com/parttimenerd/jdk-profiling-tester. The implementationWe will also add substantial JTREGunit tests which should cover all options and test the basic usage of the API.

A prototypical implementation can be found at https://github.com/openjdk/jdk-sandbox/tree/asgct2 and a demo combining it with a modified async-profiler can be found at https://github.com/parttimenerd/asgct2-demo

Reply via email to