[ https://issues.apache.org/jira/browse/HIVE-26524?focusedWorklogId=811185&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-811185 ]
ASF GitHub Bot logged work on HIVE-26524: ----------------------------------------- Author: ASF GitHub Bot Created on: 22/Sep/22 12:09 Start Date: 22/Sep/22 12:09 Worklog Time Spent: 10m Work Description: kasakrisz commented on code in PR #3588: URL: https://github.com/apache/hive/pull/3588#discussion_r977574485 ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRewriteToDataSketchesRules.java: ########## @@ -336,16 +336,13 @@ protected VBuilderPAP(Aggregate aggregate, Project project, String sketchClass) @Override boolean isApplicable(AggregateCall aggCall) { - if ((aggInput instanceof Project) - && !aggCall.isDistinct() && aggCall.getArgList().size() == 4 + if ((aggInput != null) + && !aggCall.isDistinct() && aggCall.getArgList().size() == 1 Review Comment: This rule already has a good comment https://github.com/apache/hive/blob/a017e54c98c76ccf0c185b47533b336b0a398dc7/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRewriteToDataSketchesRules.java#L296-L303 Before this patch function calls like ``` percentile_disc(0.2) within group (order by id) ``` was transformed at AST level to a 4 parameter function: ``` percentile_disc(0.2, id, asc, nulls_last) ``` and the collation was not represented in the CBO plan at all. With this patch collation is properly set in these aggregate calls in CBO plan so this rule must expect that version. An extra predicate was added to this condition which checks the number of order by keys: ``` && aggCall.collation.getFieldCollations().size() == 1) { ``` It must be 1 in case of `percentile_disc` Issue Time Tracking ------------------- Worklog Id: (was: 811185) Time Spent: 2h 20m (was: 2h 10m) > Use Calcite to remove sections of a query plan known never produces rows > ------------------------------------------------------------------------ > > Key: HIVE-26524 > URL: https://issues.apache.org/jira/browse/HIVE-26524 > Project: Hive > Issue Type: Improvement > Components: CBO > Reporter: Krisztian Kasa > Assignee: Krisztian Kasa > Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > Calcite has a set of rules to remove sections of a query plan known never > produces any rows. In some cases the whole plan can be removed. Such plans > are represented with a single {{Values}} operators with no tuples. ex.: > {code:java} > select y + 1 from (select a1 y, b1 z from t1 where b1 > 10) q WHERE 1=0 > {code} > {code:java} > HiveValues(tuples=[[]]) > {code} > Other cases when plan has outer join or set operators some branches can be > replaced with empty values moving forward in some cases the join/set operator > can be removed > {code:java} > select a2, b2 from t2 where 1=0 > union > select a1, b1 from t1 > {code} > {code:java} > HiveAggregate(group=[{0, 1}]) > HiveTableScan(table=[[default, t1]], table:alias=[t1]) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)