[ 
https://issues.apache.org/jira/browse/HIVE-26524?focusedWorklogId=811185&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-811185
 ]

ASF GitHub Bot logged work on HIVE-26524:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 22/Sep/22 12:09
            Start Date: 22/Sep/22 12:09
    Worklog Time Spent: 10m 
      Work Description: kasakrisz commented on code in PR #3588:
URL: https://github.com/apache/hive/pull/3588#discussion_r977574485


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRewriteToDataSketchesRules.java:
##########
@@ -336,16 +336,13 @@ protected VBuilderPAP(Aggregate aggregate, Project 
project, String sketchClass)
 
       @Override
       boolean isApplicable(AggregateCall aggCall) {
-        if ((aggInput instanceof Project)
-            && !aggCall.isDistinct() && aggCall.getArgList().size() == 4
+        if ((aggInput != null)
+            && !aggCall.isDistinct() && aggCall.getArgList().size() == 1

Review Comment:
   This rule already has a good comment
   
https://github.com/apache/hive/blob/a017e54c98c76ccf0c185b47533b336b0a398dc7/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRewriteToDataSketchesRules.java#L296-L303
   
   Before this patch function calls like
   ```
   percentile_disc(0.2) within group (order by id)
   ```
   was transformed at AST level to a 4 parameter function:
   ```
   percentile_disc(0.2, id, asc, nulls_last)
   ```
   and the collation was not represented in the CBO plan at all.
   
   With this patch collation is properly set in these aggregate calls in CBO 
plan so this rule must expect that version.
   An extra predicate was added to this condition which checks the number of 
order by keys:
   ```
   && aggCall.collation.getFieldCollations().size() == 1) {
   ```
   It must be 1 in case of `percentile_disc`





Issue Time Tracking
-------------------

    Worklog Id:     (was: 811185)
    Time Spent: 2h 20m  (was: 2h 10m)

> Use Calcite to remove sections of a query plan known never produces rows
> ------------------------------------------------------------------------
>
>                 Key: HIVE-26524
>                 URL: https://issues.apache.org/jira/browse/HIVE-26524
>             Project: Hive
>          Issue Type: Improvement
>          Components: CBO
>            Reporter: Krisztian Kasa
>            Assignee: Krisztian Kasa
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Calcite has a set of rules to remove sections of a query plan known never 
> produces any rows. In some cases the whole plan can be removed. Such plans 
> are represented with a single {{Values}} operators with no tuples. ex.:
> {code:java}
> select y + 1 from (select a1 y, b1 z from t1 where b1 > 10) q WHERE 1=0
> {code}
> {code:java}
> HiveValues(tuples=[[]])
> {code}
> Other cases when plan has outer join or set operators some branches can be 
> replaced with empty values moving forward in some cases the join/set operator 
> can be removed
> {code:java}
> select a2, b2 from t2 where 1=0
> union
> select a1, b1 from t1
> {code}
> {code:java}
> HiveAggregate(group=[{0, 1}])
>   HiveTableScan(table=[[default, t1]], table:alias=[t1])
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to