
ASF GitHub Bot logged work on HIVE-26671:

                Author: ASF GitHub Bot
            Created on: 27/Oct/22 12:16
            Start Date: 27/Oct/22 12:16
    Worklog Time Spent: 10m 
      Work Description: kasakrisz commented on PR #3706:
URL: https://github.com/apache/hive/pull/3706#issuecomment-1293439464

   Thanks @scarlin-cloudera for investigating this issue. This patch is a 
possible solution.
   I would like to share another approach: IIUC the issues is caused by the 
extra key column because of the distinct in the RS located in the mapper. 
   Without TNK the plan of the query mentioned in the jira looks like this:
         GBY (l_orderkey, l_partkey)
           RS (l_orderkey, l_partkey)
     GBY (KEY._col0)
       RS (col0)
   A TNK is created on top of each RS and the keys are coming from the 
corresponding RS then both TNKs pushed until TS and at TNK merging the one with 
2 keys are accepted.
   How about skipping TNK creation if RS has keys defined because of distinct 
in `TopNKeyProcessor`
 and keep the existing behavior when no distinct aggregates present.
   I would expect that only TNK (l_orderkey) remains.
   What do you think?

Issue Time Tracking

    Worklog Id:     (was: 820962)
    Time Spent: 40m  (was: 0.5h)

> Incorrect results for group by/order by/limit query with 2 aggregates
> ---------------------------------------------------------------------
>                 Key: HIVE-26671
>                 URL: https://issues.apache.org/jira/browse/HIVE-26671
>             Project: Hive
>          Issue Type: Bug
>          Components: Operators
>            Reporter: Steve Carlin
>            Assignee: Steve Carlin
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
> Grabbed this query from the Impala test suite.  It is a query run off of 
> tpcds tables, but it's not really super special.  You will need a lot of data 
> to reproduce this, though.
> select
> l_orderkey,
> min(l_shipdate) as flt,
> count(distinct l_partkey) as cnl 
> from lineitem
> group by l_orderkey order by l_orderkey limit 2;
> The issue is with the Top N Key operator optimizer. The Top N Key operator is 
> the first operator after the Table Scan.  The sort key is on both the 
> l_orderkey and l_partkey columns, but this means that the second sort key 
> might not be forwarded.

This message was sent by Atlassian Jira

Reply via email to