[ https://issues.apache.org/jira/browse/HIVE-24907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304754#comment-17304754 ]
Stamatis Zampetakis commented on HIVE-24907: -------------------------------------------- Hi [~glapark], I tested the following on commit 949ff1c67614d4f50a6231fc0b78ab5d753cbeb9 and 0361dd907bb75d7d7fab9d354d75b74c23775b3d in branch-3.1: {noformat} mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=.. -Dtest.output.overwrite mvn test -Dtest=TestMiniTezCliDriver -Dqfile=.. -Dtest.output.overwrite mvn test -Dtest=TestMinimrCliDriver -Dqfile=.. -Dtest.output.overwrite {noformat} and the results are wrong for both MiniLlap, and MiniTez. I get the correct results for Minimr that is not suprising given that even the plans are different. Just for the record the plan for which I get the wrong results is the following: {noformat} #### A masked pattern was here #### STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez #### A masked pattern was here #### Edges: Reducer 2 <- Map 1 (SIMPLE_EDGE), Union 5 (SIMPLE_EDGE) Reducer 4 <- Map 3 (SIMPLE_EDGE), Union 5 (CONTAINS) Reducer 8 <- Map 7 (SIMPLE_EDGE), Union 5 (CONTAINS) #### A masked pattern was here #### Vertices: Map 1 Map Operator Tree: TableScan alias: a Statistics: Num rows: 3 Data size: 12 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: key (type: int) outputColumnNames: key Statistics: Num rows: 3 Data size: 12 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator keys: key (type: int) mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized, llap LLAP IO: no inputs Map 3 Map Operator Tree: TableScan alias: b Statistics: Num rows: 3 Data size: 24 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (value = 2001) (type: boolean) Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: key (type: int) outputColumnNames: key Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator keys: key (type: int) mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized, llap LLAP IO: no inputs Map 7 Map Operator Tree: TableScan alias: c Statistics: Num rows: 3 Data size: 24 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (value = 2005) (type: boolean) Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: key (type: int) outputColumnNames: key Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator keys: key (type: int) mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized, llap LLAP IO: no inputs Reducer 2 Reduce Operator Tree: Group By Operator keys: KEY._col0 (type: int) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: llap Reduce Operator Tree: Group By Operator keys: KEY._col0 (type: int) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE Merge Join Operator condition map: Left Outer Join 0 to 1 keys: 0 _col0 (type: int) 1 _col0 (type: int) outputColumnNames: _col0, _col1 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Reducer 4 Execution mode: vectorized, llap Reduce Operator Tree: Group By Operator keys: KEY._col0 (type: int) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator keys: _col0 (type: int) mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE Reducer 8 Execution mode: vectorized, llap Reduce Operator Tree: Group By Operator keys: KEY._col0 (type: int) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator keys: _col0 (type: int) mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE Union 5 Vertex: Union 5 Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink {noformat} > Wrong results with LEFT JOIN and subqueries with UNION and GROUP BY > ------------------------------------------------------------------- > > Key: HIVE-24907 > URL: https://issues.apache.org/jira/browse/HIVE-24907 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 2.4.0, 3.2.0, 4.0.0 > Reporter: Stamatis Zampetakis > Assignee: Stamatis Zampetakis > Priority: Major > > The following SQL query returns wrong results when run in TEZ/LLAP: > {code:sql} > SET hive.auto.convert.sortmerge.join=true; > CREATE TABLE tbl (key int,value int); > INSERT INTO tbl VALUES (1, 2000); > INSERT INTO tbl VALUES (2, 2001); > INSERT INTO tbl VALUES (3, 2005); > SELECT sub1.key, sub2.key > FROM > (SELECT a.key FROM tbl a GROUP BY a.key) sub1 > LEFT OUTER JOIN ( > SELECT b.key FROM tbl b WHERE b.value = 2001 GROUP BY b.key > UNION > SELECT c.key FROM tbl c WHERE c.value = 2005 GROUP BY c.key) sub2 > ON sub1.key = sub2.key; > {code} > Actual results: > ||SUB1.KEY||SUB2.KEY|| > |1|NULL| > |2|NULL| > |3|NULL| > Expected results: > ||SUB1.KEY||SUB2.KEY|| > |1|NULL| > |2|2| > |3|3| > Tested can be reproduced with {{TestMiniLlapLocalCliDriver}} or > {{TestMiniTezCliDriver}} in older versions of Hive. -- This message was sent by Atlassian Jira (v8.3.4#803005)