[PR] Allow using 'serverReturnFinalResult' to optimize server partitioned table [pinot]

via GitHub Wed, 22 May 2024 21:56:02 -0700


Jackie-Jiang opened a new pull request, #13208:
URL: https://github.com/apache/pinot/pull/13208


   When a column is partitioned on each server (i.e. the same value always show 
up on the same server), the following queries can be optimized by asking server 
to directly return final aggregate result instead of intermediate aggregate 
result.
   1: `SELECT DISTINCT_COUNT(partitionedCol) FROM myTable`
   2: `SELECT DISTINCT_COUNT(partitionedCol) FROM myTable GROUP BY col`
   3: `SELECT AGG(col) FROM myTable GROUP BY partitionedCol`
   
   For all 3 queries, we can ask server to return final aggregate result, but 
there are some difference between 2 and 3. For 2, server can return final 
aggregate result, but should still keep enough groups because the aggregate 
result is not global final result, but only the final result for a partition; 
For 3, server only needs to keep `LIMIT` groups because the aggregate result is 
global final result for the group.
   
   In this PR, user can `SET serverReturnFinalResult = true;` to accelerate 1 
and 3; user can `SET serverReturnFinalResultKeyUnpartitioned = true;` to 
accelerate 2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Allow using 'serverReturnFinalResult' to optimize server partitioned table [pinot]

Reply via email to