Re: [PR] Add support for configuring aggregation function parameters in the star-tree index [pinot]

via GitHub Wed, 04 Sep 2024 03:16:50 -0700


yashmayya commented on code in PR #13835:
URL: https://github.com/apache/pinot/pull/13835#discussion_r1743453260



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/startree/AggregationFunctionColumn.java:
##########
@@ -127,9 +149,14 @@ public boolean equals(Object obj) {
     if (this == obj) {
       return true;
     }
-    if (obj instanceof AggregationFunctionColumnPair) {
-      AggregationFunctionColumnPair anotherPair = 
(AggregationFunctionColumnPair) obj;
-      return _functionType == anotherPair._functionType && 
_column.equals(anotherPair._column);
+    if (obj instanceof AggregationFunctionColumn) {
+      AggregationFunctionColumn other = (AggregationFunctionColumn) obj;
+      // TODO: Revisit this since it means that for aggregation functions 
where a certain config parameter need not be
+      // checked to determine whether a query can be served by a star-tree 
index, we won't rebuild a star-tree index
+      // if the parameter value is changed in the index configuration.
+      return _functionType == other._functionType && 
_column.equals(other._column)
+          && 
AggregationFunctionType.compareFunctionParametersForStarTree(_functionType, 
_functionParameters,
+          other._functionParameters) == 0;

Review Comment:
   Hm actually I just realized that there's an assumption probably baked into 
many places that one star-tree can't have multiple pre-aggregations for the 
same aggregation function / column pair (see 
[here](https://github.com/apache/pinot/blob/d411f34b869aaecad803a1dde659a712c8e8c18a/pinot-core/src/main/java/org/apache/pinot/core/startree/plan/StarTreeProjectPlanNode.java#L61-L91)
 for instance where we're using the `functionName__columnName` to retrieve data 
from the star-tree). I think it might be cleaner overall to make this 
assumption explicit now that we're introducing configurable function parameters 
- essentially add a table config validation that prevents a single star-tree 
index configuration (`StarTreeIndexConfig`) from having multiple configurations 
for the same function / column in `StarTreeAggregationConfig`. So, if a user 
wants to have a pre-aggregation on `DISTINCTCOUNTHLL(col, 16)` as well as 
`DISTINCTCOUNTHLL(col, 8)` for some reason, these should be configured in sepa
 rate star-trees. This way we can safely continue to use the existing logic for 
retrieving data from a star-tree index once it has already been matched for a 
query.
   
   I agree with your comment on generalizing the logic of matching aggregations 
in `StarTreeUtils.createStarTreeBasedProjectOperator()` though. I like your 
idea of modifying the `AggregationFunction` interface to allow each aggregation 
to decide whether a particular star-tree index can be used. Although I'm 
thinking that given the above observations on not having multiple entities for 
the same function / column pair in a single star-tree, we can extract the 
function parameters out of this `AggregationFunctionColumn` class (reverting it 
to what it was earlier). We can instead maintain the function parameters 
alongside the existing info in something like a 
`Map<AggregationFunctionColumnPair, Pair<Map<String, Object>, AggregationSpec>` 
(which also seems easier to reason about) in `StarTreeV2BuilderConfig`, 
`StarTreeV2Metadata` etc. WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add support for configuring aggregation function parameters in the star-tree index [pinot]

Reply via email to