[PR] Add mergeBuffer/maxSpillProximity metric for groupBy spill diagnosis (druid)

via GitHub Wed, 24 Jun 2026 11:23:23 -0700


aho135 opened a new pull request, #19627:
URL: https://github.com/apache/druid/pull/19627


   ### Description
   
   When a groupBy query runs, `ConcurrentGrouper` divides the single acquired 
merge buffer into `druid.processing.numThreads` equal slices (`sliceSize = 
capacity / numThreads`) and gives one slice to each processing thread. A query 
spills to disk as soon as its **fullest single slice** fills — at roughly 
`sizeBytes / numThreads`, which can be far below the configured 
`druid.processing.buffer.sizeBytes`.
   
   The existing metrics do not let an operator see this. 
`mergeBuffer/maxBytesUsed` is a per-query **sum across slices**, further 
discounted by the hash-table load factor, so it never approaches `sizeBytes` 
even while queries are actively spilling — making it impossible to compare 
against `druid.processing.buffer.sizeBytes` or to reason about spill pressure.
   
   Concretely, an operator with `sizeBytes = 125 MiB` and `numThreads = 240` 
(slices ≈ 546 KiB) saw `groupBy/spilledQueries` climbing while 
`mergeBuffer/maxBytesUsed` sat around ~60 MB, which looks contradictory until 
you account for slicing.
   
   #### Change
   
   This PR adds **`mergeBuffer/maxSpillProximity`**, a dimensionless gauge in 
`[0.0, 1.0]`:
   
   ```
   maxSpillProximity = maxSliceUsedBytes / (sliceSize × maxLoadFactor),  
clamped to [0, 1]
                       ↑ MAX across a query's slices, then MAX across queries
   ```
   
   - It is computed **per-slice and maxed** (never summed), because the fullest 
slice is what actually triggers a spill.
   - The denominator is `sliceSize × maxLoadFactor` (default load factor 
`0.7`), because a `BufferHashGrouper` spills when its bucket count reaches the 
load factor, not when the slice is byte-full. This makes **`1.0` correspond to 
the real spill point**.
   
   Operators can read `mergeBuffer/maxSpillProximity` alongside 
`groupBy/spilledQueries`: a value near `1.0` means slices are saturating, and 
the fix is to widen each slice by raising `druid.processing.buffer.sizeBytes` 
or lowering `druid.processing.numThreads`.
   
   #### Changed files
   - `GroupByStatsProvider` — track per-slice max used bytes and the per-slice 
spill threshold; add `getSpillProximity()` (clamped to `[0,1]`); aggregate as a 
max across queries.
   - `SpillingGrouper` — report each slice's peak usage against its spill 
threshold in `close()`.
   - `BufferHashGrouper` — expose `resolveMaxLoadFactor()` so the metric 
denominator matches the grouper's actual spill decision (including the 
default-resolution rule).
   - `GroupByStatsMonitor` — emit `mergeBuffer/maxSpillProximity`.
   - `docs/operations/metrics.md` — document the new metric and clarify the 
slicing semantics of `mergeBuffer/bytesUsed` and `mergeBuffer/maxBytesUsed`.
   
   #### Backward compatibility
   No existing emitted metric name or value changes. `mergeBuffer/bytesUsed` 
and `mergeBuffer/maxBytesUsed` are computed through the same (unchanged) code 
path as before; this PR only adds one new metric.
   
   #### Caveats
   - Limit push-down queries (`LimitedBufferHashGrouper`) spill on a heap-size 
bound rather than filling the buffer arena, so for those queries the proximity 
is a looser proxy; the dominant `BufferHashGrouper` path is exact.
   - A query that acquires merge buffers but never initializes a grouper 
reports `0.0`.
   
   <hr>
   
   This PR has:
   
   - [x] been self-reviewed.
   - [x] added documentation for new or modified features or behaviors.
   - [x] a release note entry in the PR description. (New metric 
`mergeBuffer/maxSpillProximity`; no behavior or config changes.)
   - [x] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [x] added integration tests. (N/A — metric is covered by unit tests.)
   - [x] been tested in a test Druid cluster.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Add mergeBuffer/maxSpillProximity metric for groupBy spill diagnosis (druid)

Reply via email to