yashmayya commented on code in PR #13835:
URL: https://github.com/apache/pinot/pull/13835#discussion_r1743453260
##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/startree/AggregationFunctionColumn.java:
##########
@@ -127,9 +149,14 @@ public boolean equals(Object obj) {
if (this == obj) {
return true;
}
- if (obj instanceof AggregationFunctionColumnPair) {
- AggregationFunctionColumnPair anotherPair =
(AggregationFunctionColumnPair) obj;
- return _functionType == anotherPair._functionType &&
_column.equals(anotherPair._column);
+ if (obj instanceof AggregationFunctionColumn) {
+ AggregationFunctionColumn other = (AggregationFunctionColumn) obj;
+ // TODO: Revisit this since it means that for aggregation functions
where a certain config parameter need not be
+ // checked to determine whether a query can be served by a star-tree
index, we won't rebuild a star-tree index
+ // if the parameter value is changed in the index configuration.
+ return _functionType == other._functionType &&
_column.equals(other._column)
+ &&
AggregationFunctionType.compareFunctionParametersForStarTree(_functionType,
_functionParameters,
+ other._functionParameters) == 0;
Review Comment:
Hm actually I just realized that there's an assumption probably baked into
many places that one star-tree can't have multiple pre-aggregations for the
same aggregation function / column pair (see
[here](https://github.com/apache/pinot/blob/d411f34b869aaecad803a1dde659a712c8e8c18a/pinot-core/src/main/java/org/apache/pinot/core/startree/plan/StarTreeProjectPlanNode.java#L61-L91)
for instance where we're using the `functionName__columnName` to retrieve data
from the star-tree). I think it might be cleaner overall to make this
assumption explicit now that we're introducing configurable function parameters
- essentially add a table config validation that prevents a single star-tree
index configuration (`StarTreeIndexConfig`) from having multiple configurations
for the same function / column in `StarTreeAggregationConfig`. So, if a user
wants to have a pre-aggregation on `DISTINCTCOUNTHLL(col, 16)` as well as
`DISTINCTCOUNTHLL(col, 8)` for some reason, these should be configured in sepa
rate star-trees. This way we can safely continue to use the existing logic for
retrieving data from a star-tree index once it has already been matched for a
query.
I agree with your comment on generalizing the logic of matching aggregations
in `StarTreeUtils.createStarTreeBasedProjectOperator()` though. I like your
idea of modifying the `AggregationFunction` interface to allow each aggregation
to decide whether a particular star-tree index can be used. Although I'm
thinking that given the above observations on not having multiple entities for
the same function / column pair in a single star-tree, we can extract the
function parameters out of this `AggregationFunctionColumn` class (reverting it
to what it was earlier). We can instead maintain the function parameters
alongside the existing info in something like a
`Map<AggregationFunctionColumnPair, Pair<Map<String, Object>, AggregationSpec>`
(which also seems easier to reason about) in `StarTreeV2BuilderConfig`,
`StarTreeV2Metadata` etc. WDYT?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]