[ https://issues.apache.org/jira/browse/HIVE-8225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144025#comment-14144025 ]
Sergey Shelukhin edited comment on HIVE-8225 at 9/22/14 11:29 PM: ------------------------------------------------------------------ Minimum query to reproduce the issue: {noformat} select unionsrc.key FROM (select 'tst1' as key, count(1) as value from src s1) unionsrc; {noformat} - returns 500 rows of tst1 whereas it should return just 1. If you add value to select list - {noformat} select unionsrc.key,unionsrc.value FROM (select 'tst1' as key, count(1) as value from src s1) unionsrc; {noformat} the problem disappears. ASTs for both queries, both before and after CBO differ only in addition of the last select expression; however, as is obvious from the below, in CBO the absence of said expression causes the count to be completely gone. Initial AST for the 2nd query (correct result): {noformat} TOK_QUERY TOK_FROM TOK_SUBQUERY TOK_QUERY TOK_FROM TOK_TABREF TOK_TABNAME src s1 TOK_INSERT TOK_DESTINATION TOK_DIR TOK_TMP_FILE TOK_SELECT TOK_SELEXPR 'tst1' key TOK_SELEXPR TOK_FUNCTION count 1 value unionsrc TOK_INSERT TOK_DESTINATION TOK_DIR TOK_TMP_FILE TOK_SELECT TOK_SELEXPR . TOK_TABLE_OR_COL unionsrc key TOK_SELEXPR . TOK_TABLE_OR_COL unionsrc value {noformat} Post-CBO {noformat} TOK_QUERY TOK_FROM TOK_SUBQUERY TOK_QUERY TOK_FROM TOK_SUBQUERY TOK_QUERY TOK_FROM TOK_TABREF TOK_TABNAME default src s1 TOK_INSERT TOK_DESTINATION TOK_DIR TOK_TMP_FILE TOK_SELECT TOK_SELEXPR 0 DUMMY $hdt$_2 TOK_INSERT TOK_DESTINATION TOK_DIR TOK_TMP_FILE TOK_SELECT TOK_SELEXPR 1 $f0 $hdt$_3 TOK_INSERT TOK_DESTINATION TOK_DIR TOK_TMP_FILE TOK_SELECT TOK_SELEXPR 'tst1' unionsrc.key TOK_SELEXPR TOK_FUNCTION count . TOK_TABLE_OR_COL $hdt$_3 $f0 unionsrc.value {noformat} Note where count is... in key-only query, where this SELEXPR is gone, there's no count, so result changes. was (Author: sershe): Minimum query to reproduce the issue: {noformat} select unionsrc.key FROM (select 'tst1' as key, count(1) as value from src s1) unionsrc; {noformat} - returns 500 rows of tst1 whereas it should return just 1. If you add value to select list - {noformat} select unionsrc.key,unionsrc.value FROM (select 'tst1' as key, count(1) as value from src s1) unionsrc; {noformat} the problem disappears. ASTs for both queries, both before and after CBO differ only in addition of the last select expression; however, as is obvious from the below, in CBO the absence of said expression causes the count to be completely gone. Initial AST for the 2nd query (correct result): {noformat} TOK_QUERY TOK_FROM TOK_SUBQUERY TOK_QUERY TOK_FROM TOK_TABREF TOK_TABNAME src s1 TOK_INSERT TOK_DESTINATION TOK_DIR TOK_TMP_FILE TOK_SELECT TOK_SELEXPR 'tst1' key TOK_SELEXPR TOK_FUNCTION count 1 value unionsrc TOK_INSERT TOK_DESTINATION TOK_DIR TOK_TMP_FILE TOK_SELECT TOK_SELEXPR . TOK_TABLE_OR_COL unionsrc key TOK_SELEXPR . TOK_TABLE_OR_COL unionsrc value {noformat} Post-CBO {noformat} TOK_QUERY TOK_FROM TOK_SUBQUERY TOK_QUERY TOK_FROM TOK_SUBQUERY TOK_QUERY TOK_FROM TOK_TABREF TOK_TABNAME default src s1 TOK_INSERT TOK_DESTINATION TOK_DIR TOK_TMP_FILE TOK_SELECT TOK_SELEXPR 0 DUMMY $hdt$_2 TOK_INSERT TOK_DESTINATION TOK_DIR TOK_TMP_FILE TOK_SELECT TOK_SELEXPR 1 $f0 $hdt$_3 TOK_INSERT TOK_DESTINATION TOK_DIR TOK_TMP_FILE TOK_SELECT TOK_SELEXPR 'tst1' unionsrc.key TOK_SELEXPR TOK_FUNCTION count . TOK_TABLE_OR_COL $hdt$_3 $f0 unionsrc.value {noformat} Note where count is... in key-only query, where this SELEXPER is gone, there's no count, so result changes. > CBO trunk merge: union11 test fails due to incorrect plan > --------------------------------------------------------- > > Key: HIVE-8225 > URL: https://issues.apache.org/jira/browse/HIVE-8225 > Project: Hive > Issue Type: Bug > Reporter: Sergey Shelukhin > Assignee: Sergey Shelukhin > > The result changes to as if the union didn't have count() inside. The issue > can be fixed by using srcunion.value outside the subquery in count (replace > count(1) with count(srcunion.value)). Otherwise, it looks like count(1) node > from union-ed queries is not present in AST at all, which might cause this > result. > Interestingly, adding group by to each query in a union produces completely > weird result (count(1) is 309 for each key, whereas it should be 1 and the > "logical" incorrect value if internal count is lost is 500) -- This message was sent by Atlassian JIRA (v6.3.4#6332)