yashmayya commented on code in PR #14211:
URL: https://github.com/apache/pinot/pull/14211#discussion_r1797322234
##########
pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java:
##########
@@ -458,6 +458,15 @@ public static class QueryOptionKey {
// executed in an Unbounded FCFS fashion. However, secondary
workloads are executed in a constrainted FCFS
// fashion with limited compute.
public static final String IS_SECONDARY_WORKLOAD =
"isSecondaryWorkload";
+
+ // For group by queries with only filtered aggregations (and no
non-filtered aggregations), the v1 query engine
+ // does not compute all groups by default - instead, it will only
compute the groups from the filtered result
+ // set (i.e., union of the main query filter and all the individual
aggregation filters). This is good for
+ // performance, since indexes can be used, but the result won't match
standard SQL behavior (where all groups
+ // are expected to be returned). If this option is set to true, the v1
query engine will compute all groups for
+ // group by queries with only filtered aggregations. This could
require a full scan if the main query does not
+ // have a filter and performance could be much worse, but the result
will be SQL compliant.
+ public static final String FILTERED_AGGREGATIONS_COMPUTE_ALL_GROUPS =
"filteredAggregationsComputeAllGroups";
Review Comment:
> Since the overhead seems fine (no extra aggregate needed), I'd suggest
doing the standard SQL behavior by default, and add an option to skip the all
groups computation
I think it makes sense to have standard SQL behavior by default and have an
option to use the more performant method of only computing required groups. But
won't the overhead potentially be quite high for a query like `SELECT SUM(X)
FILTER (WHERE Y = 1) FROM mytable` and there's an inverted index on the filter
column `Y`? If we want to compute all groups, it'll involve a scan over all
docs whereas with the previous default behavior, no scan would be required and
we'd only need to do the aggregation over the docs matching the filter computed
using the inverted index. There won't be any additional _aggregation_ being
done but the scan would still be expensive right?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]