Re: [PR] fix: Optimize nullable numeric aggregation for Pooled TopN (druid)

via GitHub Sat, 02 May 2026 03:17:49 -0700


Copilot commented on code in PR #19390:
URL: https://github.com/apache/druid/pull/19390#discussion_r3176491588



##########
processing/src/main/java/org/apache/druid/query/aggregation/SimpleLongAggregatorFactory.java:
##########
@@ -130,6 +130,13 @@ protected boolean useGetObject(ColumnSelectorFactory 
columnSelectorFactory)
     return AggregatorUtil.shouldUseObjectColumnAggregatorWrapper(fieldName, 
columnSelectorFactory);
   }
 
+  @Override
+  @Nullable
+  protected String getInputColumn()
+  {
+    return expression == null ? fieldName : null;
+  }

Review Comment:
   This hook enables the new Pooled TopN nullability shortcut for long 
aggregators too, but the added coverage only exercises 
`DoubleSumAggregatorFactory`. Since `LongSumAggregatorTest` already exists, the 
long path should get the same hasNulls=false / unknown-nullability assertions; 
otherwise a long-specific regression here would go unnoticed.



##########
processing/src/main/java/org/apache/druid/query/aggregation/SimpleFloatAggregatorFactory.java:
##########
@@ -124,6 +124,13 @@ protected boolean useGetObject(ColumnSelectorFactory 
columnSelectorFactory)
     return AggregatorUtil.shouldUseObjectColumnAggregatorWrapper(fieldName, 
columnSelectorFactory);
   }
 
+  @Override
+  @Nullable
+  protected String getInputColumn()
+  {
+    return expression == null ? fieldName : null;
+  }

Review Comment:
   This opt-in hook now affects all float aggregators in Pooled TopN, but the 
PR doesn't add any float-based coverage for `factorizeBufferedForPooledTopN` or 
the pooled intermediate-size calculation. Without one concrete float-aggregator 
test, a regression in this path won't be caught even though similar float 
aggregators already have unit tests in this module.



##########
processing/src/main/java/org/apache/druid/query/topn/TopNQueryEngine.java:
##########
@@ -172,7 +172,10 @@ private TopNMapFn getMapFn(
 
     int numBytesPerRecord = 0;
     for (AggregatorFactory aggregatorFactory : query.getAggregatorSpecs()) {
-      numBytesPerRecord += aggregatorFactory.getMaxIntermediateSizeWithNulls();
+      numBytesPerRecord += 
PooledTopNAlgorithm.getMaxIntermediateSizeWithNullsForPooledTopN(
+          cursorInspector.getColumnInspector(),
+          aggregatorFactory
+      );

Review Comment:
   This changes pooled-vs-heap TopN selection to use the reduced Pooled TopN 
intermediate size, but the PR doesn't add any TopN test that exercises a query 
near the pooled-buffer or aggregate-metric-first thresholds. 
`PooledTopNAlgorithmTest` currently only covers cleanup, so a regression here 
could silently disable the optimization without failing tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix: Optimize nullable numeric aggregation for Pooled TopN (druid)

Reply via email to