Re: [PR] Compute all groups for group by queries with only filtered aggregations [pinot]

via GitHub Fri, 11 Oct 2024 11:56:24 -0700


yashmayya commented on code in PR #14211:
URL: https://github.com/apache/pinot/pull/14211#discussion_r1797322234



##########
pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java:
##########
@@ -458,6 +458,15 @@ public static class QueryOptionKey {
         // executed in an  Unbounded FCFS fashion. However, secondary 
workloads are executed in a constrainted FCFS
         // fashion with limited compute.
         public static final String IS_SECONDARY_WORKLOAD = 
"isSecondaryWorkload";
+
+        // For group by queries with only filtered aggregations (and no 
non-filtered aggregations), the v1 query engine
+        // does not compute all groups by default - instead, it will only 
compute the groups from the filtered result
+        // set (i.e., union of the main query filter and all the individual 
aggregation filters). This is good for
+        // performance, since indexes can be used, but the result won't match 
standard SQL behavior (where all groups
+        // are expected to be returned). If this option is set to true, the v1 
query engine will compute all groups for
+        // group by queries with only filtered aggregations. This could 
require a full scan if the main query does not
+        // have a filter and performance could be much worse, but the result 
will be SQL compliant.
+        public static final String FILTERED_AGGREGATIONS_COMPUTE_ALL_GROUPS = 
"filteredAggregationsComputeAllGroups";

Review Comment:
   > Since the overhead seems fine (no extra aggregate needed), I'd suggest 
doing the standard SQL behavior by default, and add an option to skip the all 
groups computation
   
   I think it makes sense to have standard SQL behavior by default and have an 
option to use the more performant method of only computing required groups. But 
won't the overhead potentially be quite high for a query like `SELECT SUM(X) 
FILTER (WHERE Y = 1) FROM mytable` and there's an inverted index on the filter 
column `Y`? If we want to compute all groups, it'll involve a scan over all 
docs whereas with the previous default behavior, no scan would be required and 
we'd only need to do the aggregation over the docs matching the filter computed 
using the inverted index. There won't be any additional _aggregation_ being 
done but the scan would still be expensive right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Compute all groups for group by queries with only filtered aggregations [pinot]

Reply via email to