[ 
https://issues.apache.org/jira/browse/HIVE-8225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144025#comment-14144025
 ] 

Sergey Shelukhin commented on HIVE-8225:
----------------------------------------

Minimum query to reproduce the issue:
{noformat}
select unionsrc.key FROM (select 'tst1' as key, count(1) as value from src s1) 
unionsrc;
{noformat} - returns 500 rows of tst1 whereas it should return just 1.
If you add value to select list -
{noformat}
select unionsrc.key,unionsrc.value FROM (select 'tst1' as key, count(1) as 
value from src s1) unionsrc;
{noformat} the problem disappears.

ASTs for both queries, both before and after CBO differ only in addition of the 
last select expression; however, as is obvious from the below, in CBO the 
absence of said expression causes the count to be completely gone.
Initial AST for the 2nd query (correct result):
{noformat}
TOK_QUERY
   TOK_FROM
      TOK_SUBQUERY
         TOK_QUERY
            TOK_FROM
               TOK_TABREF
                  TOK_TABNAME
                     src
                  s1
            TOK_INSERT
               TOK_DESTINATION
                  TOK_DIR
                     TOK_TMP_FILE
               TOK_SELECT
                  TOK_SELEXPR
                     'tst1'
                     key
                  TOK_SELEXPR
                     TOK_FUNCTION
                        count
                        1
                     value
         unionsrc
   TOK_INSERT
      TOK_DESTINATION
         TOK_DIR
            TOK_TMP_FILE
      TOK_SELECT
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  unionsrc
               key
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  unionsrc
               value
{noformat}

Post-CBO
{noformat}
TOK_QUERY
   TOK_FROM
      TOK_SUBQUERY
         TOK_QUERY
            TOK_FROM
               TOK_SUBQUERY
                  TOK_QUERY
                     TOK_FROM
                        TOK_TABREF
                           TOK_TABNAME
                              default
                              src
                           s1
                     TOK_INSERT
                        TOK_DESTINATION
                           TOK_DIR
                              TOK_TMP_FILE
                        TOK_SELECT
                           TOK_SELEXPR
                              0
                              DUMMY
                  $hdt$_2
            TOK_INSERT
               TOK_DESTINATION
                  TOK_DIR
                     TOK_TMP_FILE
               TOK_SELECT
                  TOK_SELEXPR
                     1
                     $f0
         $hdt$_3
   TOK_INSERT
      TOK_DESTINATION
         TOK_DIR
            TOK_TMP_FILE
      TOK_SELECT
         TOK_SELEXPR
            'tst1'
            unionsrc.key
         TOK_SELEXPR
            TOK_FUNCTION
               count
               .
                  TOK_TABLE_OR_COL
                     $hdt$_3
                  $f0
            unionsrc.value
{noformat}
Note where count is... in key-only query, where this SELEXPER is gone, there's 
no count, so result changes.

> CBO trunk merge: union11 test fails due to incorrect plan
> ---------------------------------------------------------
>
>                 Key: HIVE-8225
>                 URL: https://issues.apache.org/jira/browse/HIVE-8225
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>
> The result changes to as if the union didn't have count() inside. The issue 
> can be fixed by using srcunion.value outside the subquery in count (replace 
> count(1) with count(srcunion.value)). Otherwise, it looks like count(1) node 
> from union-ed queries is not present in AST at all, which might cause this 
> result.
> Interestingly, adding group by to each query in a union produces completely 
> weird result (count(1) is 309 for each key, whereas it should be 1 and the 
> "logical" incorrect value if internal count is lost is 500)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to