[ https://issues.apache.org/jira/browse/HIVE-13623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15799003#comment-15799003 ]
Costas Piliotis commented on HIVE-13623: ---------------------------------------- Hi, I've just seen this issue too on tez 0.7.0. Query works fine on mr engine and fine in presto but tez is producing incorrect results. I'm using a CTE {code:sql} select tableaCTE.intCol1, tableaCTE.strCol2 from ( select distinct intCol1, strCol2 from tablea ) tableaCTE left outer join tableb on ( tableaCTE.strCol2=tableb.strCol2 AND tableaCTE.intCol1=tableb.intCol1 ) where tableb.intCol1 IS NULL {code} I've tried rewriting the query to use a WITH clause for the CTE and it made no difference. *however* if I save my data to a temp table first, the query does return correct results, so I'd be inclined to think that this is a runtime interpretation of the CTE being handled incorrectly. When I set the execution engine to mapreduce, it works fine and returns expected results, and this same query returns expected results in presto as well. It appears to be a tez related issue. I know this is marked as trivial, but I'll argue that if hiveQL is returning incorrect results on this pattern it shouldn't be interpreted as trivial. Below is my actual explain plan for my real query: {code} Explain Plan not optimized by CBO. Vertex dependency in root stage Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (CUSTOM_SIMPLE_EDGE) Reducer 3 <- Reducer 2 (SIMPLE_EDGE) Stage-0 Fetch Operator limit:-1 Stage-1 Reducer 3 File Output Operator [FS_28] compressed:true Statistics:Num rows: 5032426 Data size: 523372372 Basic stats: COMPLETE Column stats: NONE table:{"serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe","input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"} Select Operator [OP_27] | outputColumnNames:["_col0","_col1","_col2","_col3","_col4"] | Statistics:Num rows: 5032426 Data size: 523372372 Basic stats: COMPLETE Column stats: NONE |<-Reducer 2 [SIMPLE_EDGE] Reduce Output Operator [RS_26] key expressions:_col0 (type: int), _col1 (type: string) sort order:++ Statistics:Num rows: 5032426 Data size: 523372372 Basic stats: COMPLETE Column stats: NONE value expressions:_col2 (type: int), _col3 (type: string), _col4 (type: boolean) Select Operator [OP_25] outputColumnNames:["_col0","_col1","_col2","_col3","_col4"] Statistics:Num rows: 5032426 Data size: 523372372 Basic stats: COMPLETE Column stats: NONE Filter Operator [FIL_24] predicate:_col4 is null (type: boolean) Statistics:Num rows: 5032426 Data size: 523372372 Basic stats: COMPLETE Column stats: NONE Map Join Operator [MAPJOIN_23] | condition map:[{"":"Left Outer Join0 to 1"}] | keys:{"Reducer 2":"_col0 (type: int), _col1 (type: string)","Map 4":"dim_language_id (type: int), localized_level_name (type: string)"} | outputColumnNames:["_col0","_col1","_col4","_col5","_col6"] | Statistics:Num rows: 10064852 Data size: 1046744745 Basic stats: COMPLETE Column stats: NONE |<-Map 4 [CUSTOM_SIMPLE_EDGE] | Reduce Output Operator [RS_20] | key expressions:dim_language_id (type: int), localized_level_name (type: string) | Map-reduce partition columns:dim_language_id (type: int), localized_level_name (type: string) | sort order:++ | Statistics:Num rows: 7741 Data size: 1068258 Basic stats: COMPLETE Column stats: NONE | value expressions:dim_level_nl_id (type: int) | TableScan [TS_6] | alias:da | Statistics:Num rows: 7741 Data size: 1068258 Basic stats: COMPLETE Column stats: NONE |<-Select Operator [OP_22] outputColumnNames:["_col0","_col1"] Statistics:Num rows: 9149866 Data size: 951586112 Basic stats: COMPLETE Column stats: NONE Group By Operator [OP_21] | keys:KEY._col0 (type: int), KEY._col1 (type: string), KEY._col2 (type: int) | outputColumnNames:["_col0","_col1","_col2"] | Statistics:Num rows: 9149866 Data size: 951586112 Basic stats: COMPLETE Column stats: NONE |<-Map 1 [SIMPLE_EDGE] Reduce Output Operator [RS_3] key expressions:_col0 (type: int), _col1 (type: string), _col2 (type: int) Map-reduce partition columns:_col0 (type: int), _col1 (type: string), _col2 (type: int) sort order:+++ Statistics:Num rows: 18299732 Data size: 1903172224 Basic stats: COMPLETE Column stats: NONE Group By Operator [GBY_2] keys:dim_language_id (type: int), level (type: string), 1 (type: int) outputColumnNames:["_col0","_col1","_col2"] Statistics:Num rows: 18299732 Data size: 1903172224 Basic stats: COMPLETE Column stats: NONE Select Operator [SEL_1] outputColumnNames:["dim_language_id","level"] Statistics:Num rows: 18299732 Data size: 1903172224 Basic stats: COMPLETE Column stats: NONE TableScan [TS_0] alias:trapteam_fact_event_timeline Statistics:Num rows: 18299732 Data size: 1903172224 Basic stats: COMPLETE Column stats: NONE {code} > Hive on tez produce wrong results when withClause and (outer) joins > ------------------------------------------------------------------- > > Key: HIVE-13623 > URL: https://issues.apache.org/jira/browse/HIVE-13623 > Project: Hive > Issue Type: Bug > Affects Versions: 1.1.0, 2.0.0 > Reporter: JinsuKim > Assignee: Sivashankar > Priority: Trivial > > Hive on mr produce correct results when (outer) joins and withClause. But tez > produce wrong results > {code:sql|title=Case1} > with a as ( > select 1 as c1 > union all > select 2 as c1 > union all > select 3 as c1 > ), > b as ( > select 1 as c1 > ) > select * > from ( select a.c1 as ac1, b.c1 as bc1 > from a left outer join b > on a.c1 = b.c1 > ) c; > where c.bc1 is null > {code} > {code:title=case1 result} > mr : > +--------+--------+--+ > | c.ac1 | c.bc1 | > +--------+--------+--+ > | 2 | NULL | > | 3 | NULL | > +--------+--------+--+ > tez : > +--------+--------+--+ > | c.ac1 | c.bc1 | > +--------+--------+--+ > +--------+--------+--+ > {code} > Case2 as similar to Case1 produces same results. > {code:sql|title=Case2} > with a as ( > select * from j1 > ), > b as ( > select * from j2 > ) > select * > from ( select a.c1 as ac1, b.c1 as bc1 > from a left outer join b > on a.c1 = b.c1 > ) c > where c.bc1 is null; > > drop table j1; > create table j1 as select c1 from > ( select 1 as c1 from default.dual > union all > select 2 as c1 from default.dual > union all > select 3 as c1 from default.dual > ) t1; > drop table j2; > create table j2 as > select 1 as c1 from default.dual; > {code} > {code:title=case2 result} > mr : > +--------+--------+--+ > | c.ac1 | c.bc1 | > +--------+--------+--+ > | 2 | NULL | > | 3 | NULL | > +--------+--------+--+ > tez : > +--------+--------+--+ > | c.ac1 | c.bc1 | > +--------+--------+--+ > | 2 | NULL | > | 3 | NULL | > +--------+--------+--+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)