gortiz opened a new pull request, #17851:
URL: https://github.com/apache/pinot/pull/17851
## Summary
This PR reworks Pinot's continuous JFR integration to use the JVM
DiagnosticCommand MBean (`JFR.configure`, `JFR.start`, `JFR.stop`) instead of
the `jdk.jfr.Recording` object API, while keeping cluster-level dynamic control
through `ContinuousJfrStarter`.
## Why this change is needed
The previous implementation had correctness and operability issues:
- It used `Recording.setDestination(...)`, which does not match the intended
long-running "continuous recording + operator-managed dumps" behavior.
- It mixed concerns by trying to handle dump-file lifecycle/cleanup in Pinot
code.
- It did not provide a reliable way to control JFR repository location
through Pinot in the way operators need.
- It made semantics around dump boundaries/rotation unclear.
The new approach aligns Pinot with JFR's actual control model and removes
behavior that was easy to misinterpret.
## Why we decided not to keep using `Recording`
We moved away from `Recording` API for control flow because important
operational knobs we need are in the global JFR control plane, not the
per-recording API:
- repository path (`repositorypath`)
- dump path (`dumppath`)
- unified runtime command semantics for start/stop/configure
Using DiagnosticCommand MBean gives one coherent mechanism for all required
controls and maps directly to `jcmd` behavior.
## Why we still keep `ContinuousJfrStarter` (instead of only `jcmd` or
`JAVA_OPTS`)
We intentionally keep `ContinuousJfrStarter` as Pinot's control plane
because it gives cluster-wide operability that `JAVA_OPTS`/manual `jcmd` do not:
- Dynamic reconfiguration without restart via Pinot cluster config updates.
- Single cluster-level control surface for enable/disable and JFR parameters.
- No need to manually run commands on many nodes during incidents.
- Better consistency with Pinot's existing config propagation model across
cluster roles.
`JAVA_OPTS` and direct `jcmd` remain valid tools, but they are not
sufficient as the primary Pinot control plane for large clusters.
## What changed
### `ContinuousJfrStarter` behavior
- Uses DiagnosticCommand MBean commands:
- `jfrConfigure` for runtime options (`repositorypath`, `dumppath`)
- `jfrStart` for recording start
- `jfrStop` for stop by recording name
- Supports cluster config mapping:
- `pinot.jfr.directory` -> `repositorypath` (kept for backward
compatibility)
- `pinot.jfr.dumpPath` -> `dumppath`
- existing keys still supported: `enabled`, `name`, `configuration`,
`toDisk`, `maxSize`, `maxAge`, `dumpOnExit`
- Keeps one continuous recording model; no dump rotation/cleanup thread
logic.
### Reliability hardening
- If DiagnosticCommand MBean is unavailable, Pinot logs warnings and safely
no-ops instead of failing startup/runtime paths.
- Static initialization is safe even if `ObjectName` creation fails (logs
warning; does not throw).
### Tests
- Unit-style tests validate command generation and control flow.
- Added integration test that uses the real DiagnosticCommand MBean and
verifies recording visibility via `jfrCheck`.
- Added coverage for graceful behavior when MBean is unavailable.
### Docs (on pinot-contrib/pinot-docs)
- Updated continuous JFR operator docs to reflect:
- MBean-based control model
- `pinot.jfr.directory` compatibility key and `pinot.jfr.dumpPath`
- graceful degradation when MBean is unavailable
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]