aho135 opened a new pull request, #19627:
URL: https://github.com/apache/druid/pull/19627
### Description
When a groupBy query runs, `ConcurrentGrouper` divides the single acquired
merge buffer into `druid.processing.numThreads` equal slices (`sliceSize =
capacity / numThreads`) and gives one slice to each processing thread. A query
spills to disk as soon as its **fullest single slice** fills — at roughly
`sizeBytes / numThreads`, which can be far below the configured
`druid.processing.buffer.sizeBytes`.
The existing metrics do not let an operator see this.
`mergeBuffer/maxBytesUsed` is a per-query **sum across slices**, further
discounted by the hash-table load factor, so it never approaches `sizeBytes`
even while queries are actively spilling — making it impossible to compare
against `druid.processing.buffer.sizeBytes` or to reason about spill pressure.
Concretely, an operator with `sizeBytes = 125 MiB` and `numThreads = 240`
(slices ≈ 546 KiB) saw `groupBy/spilledQueries` climbing while
`mergeBuffer/maxBytesUsed` sat around ~60 MB, which looks contradictory until
you account for slicing.
#### Change
This PR adds **`mergeBuffer/maxSpillProximity`**, a dimensionless gauge in
`[0.0, 1.0]`:
```
maxSpillProximity = maxSliceUsedBytes / (sliceSize × maxLoadFactor),
clamped to [0, 1]
↑ MAX across a query's slices, then MAX across queries
```
- It is computed **per-slice and maxed** (never summed), because the fullest
slice is what actually triggers a spill.
- The denominator is `sliceSize × maxLoadFactor` (default load factor
`0.7`), because a `BufferHashGrouper` spills when its bucket count reaches the
load factor, not when the slice is byte-full. This makes **`1.0` correspond to
the real spill point**.
Operators can read `mergeBuffer/maxSpillProximity` alongside
`groupBy/spilledQueries`: a value near `1.0` means slices are saturating, and
the fix is to widen each slice by raising `druid.processing.buffer.sizeBytes`
or lowering `druid.processing.numThreads`.
#### Changed files
- `GroupByStatsProvider` — track per-slice max used bytes and the per-slice
spill threshold; add `getSpillProximity()` (clamped to `[0,1]`); aggregate as a
max across queries.
- `SpillingGrouper` — report each slice's peak usage against its spill
threshold in `close()`.
- `BufferHashGrouper` — expose `resolveMaxLoadFactor()` so the metric
denominator matches the grouper's actual spill decision (including the
default-resolution rule).
- `GroupByStatsMonitor` — emit `mergeBuffer/maxSpillProximity`.
- `docs/operations/metrics.md` — document the new metric and clarify the
slicing semantics of `mergeBuffer/bytesUsed` and `mergeBuffer/maxBytesUsed`.
#### Backward compatibility
No existing emitted metric name or value changes. `mergeBuffer/bytesUsed`
and `mergeBuffer/maxBytesUsed` are computed through the same (unchanged) code
path as before; this PR only adds one new metric.
#### Caveats
- Limit push-down queries (`LimitedBufferHashGrouper`) spill on a heap-size
bound rather than filling the buffer arena, so for those queries the proximity
is a looser proxy; the dominant `BufferHashGrouper` path is exact.
- A query that acquires merge buffers but never initializes a grouper
reports `0.0`.
<hr>
This PR has:
- [x] been self-reviewed.
- [x] added documentation for new or modified features or behaviors.
- [x] a release note entry in the PR description. (New metric
`mergeBuffer/maxSpillProximity`; no behavior or config changes.)
- [x] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [x] added integration tests. (N/A — metric is covered by unit tests.)
- [x] been tested in a test Druid cluster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]