[jira] [Commented] (HIVE-13623) Hive on tez produce wrong results when withClause and (outer) joins

Costas Piliotis (JIRA) Wed, 04 Jan 2017 10:39:01 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-13623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15799003#comment-15799003
 ]


Costas Piliotis commented on HIVE-13623:
----------------------------------------

Hi, I've just seen this issue too on tez 0.7.0.  

Query works fine on mr engine and fine in presto but tez is producing incorrect 
results.   I'm using a CTE

{code:sql}
select  tableaCTE.intCol1, tableaCTE.strCol2
from (
  select distinct intCol1, strCol2
  from tablea ) tableaCTE
    left outer join tableb on (
             tableaCTE.strCol2=tableb.strCol2 AND
            tableaCTE.intCol1=tableb.intCol1
           )
where tableb.intCol1 IS NULL
{code}

I've tried rewriting the query to use a WITH clause for the CTE and it made no 
difference.

*however* if I save my data to a temp table first, the query does return 
correct results, so I'd be inclined to think that this is a runtime 
interpretation of the CTE being handled incorrectly.

When I set the execution engine to mapreduce, it works fine and returns 
expected results, and this same query returns expected results in presto as 
well.    It appears to be a tez related issue.

I know this is marked as trivial, but I'll argue that if hiveQL is returning 
incorrect results on this pattern it shouldn't be interpreted as trivial.

Below is my actual explain plan for my real query:
{code}
Explain
Plan not optimized by CBO.

Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (CUSTOM_SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE)

Stage-0
   Fetch Operator
      limit:-1
      Stage-1
         Reducer 3
         File Output Operator [FS_28]
            compressed:true
            Statistics:Num rows: 5032426 Data size: 523372372 Basic stats: 
COMPLETE Column stats: NONE
            
table:{"serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe","input 
format:":"org.apache.hadoop.mapred.TextInputFormat","output 
format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"}
            Select Operator [OP_27]
            |  outputColumnNames:["_col0","_col1","_col2","_col3","_col4"]
            |  Statistics:Num rows: 5032426 Data size: 523372372 Basic stats: 
COMPLETE Column stats: NONE
            |<-Reducer 2 [SIMPLE_EDGE]
               Reduce Output Operator [RS_26]
                  key expressions:_col0 (type: int), _col1 (type: string)
                  sort order:++
                  Statistics:Num rows: 5032426 Data size: 523372372 Basic 
stats: COMPLETE Column stats: NONE
                  value expressions:_col2 (type: int), _col3 (type: string), 
_col4 (type: boolean)
                  Select Operator [OP_25]
                     outputColumnNames:["_col0","_col1","_col2","_col3","_col4"]
                     Statistics:Num rows: 5032426 Data size: 523372372 Basic 
stats: COMPLETE Column stats: NONE
                     Filter Operator [FIL_24]
                        predicate:_col4 is null (type: boolean)
                        Statistics:Num rows: 5032426 Data size: 523372372 Basic 
stats: COMPLETE Column stats: NONE
                        Map Join Operator [MAPJOIN_23]
                        |  condition map:[{"":"Left Outer Join0 to 1"}]
                        |  keys:{"Reducer 2":"_col0 (type: int), _col1 (type: 
string)","Map 4":"dim_language_id (type: int), localized_level_name (type: 
string)"}
                        |  
outputColumnNames:["_col0","_col1","_col4","_col5","_col6"]
                        |  Statistics:Num rows: 10064852 Data size: 1046744745 
Basic stats: COMPLETE Column stats: NONE
                        |<-Map 4 [CUSTOM_SIMPLE_EDGE]
                        |  Reduce Output Operator [RS_20]
                        |     key expressions:dim_language_id (type: int), 
localized_level_name (type: string)
                        |     Map-reduce partition columns:dim_language_id 
(type: int), localized_level_name (type: string)
                        |     sort order:++
                        |     Statistics:Num rows: 7741 Data size: 1068258 
Basic stats: COMPLETE Column stats: NONE
                        |     value expressions:dim_level_nl_id (type: int)
                        |     TableScan [TS_6]
                        |        alias:da
                        |        Statistics:Num rows: 7741 Data size: 1068258 
Basic stats: COMPLETE Column stats: NONE
                        |<-Select Operator [OP_22]
                              outputColumnNames:["_col0","_col1"]
                              Statistics:Num rows: 9149866 Data size: 951586112 
Basic stats: COMPLETE Column stats: NONE
                              Group By Operator [OP_21]
                              |  keys:KEY._col0 (type: int), KEY._col1 (type: 
string), KEY._col2 (type: int)
                              |  outputColumnNames:["_col0","_col1","_col2"]
                              |  Statistics:Num rows: 9149866 Data size: 
951586112 Basic stats: COMPLETE Column stats: NONE
                              |<-Map 1 [SIMPLE_EDGE]
                                 Reduce Output Operator [RS_3]
                                    key expressions:_col0 (type: int), _col1 
(type: string), _col2 (type: int)
                                    Map-reduce partition columns:_col0 (type: 
int), _col1 (type: string), _col2 (type: int)
                                    sort order:+++
                                    Statistics:Num rows: 18299732 Data size: 
1903172224 Basic stats: COMPLETE Column stats: NONE
                                    Group By Operator [GBY_2]
                                       keys:dim_language_id (type: int), level 
(type: string), 1 (type: int)
                                       
outputColumnNames:["_col0","_col1","_col2"]
                                       Statistics:Num rows: 18299732 Data size: 
1903172224 Basic stats: COMPLETE Column stats: NONE
                                       Select Operator [SEL_1]
                                          
outputColumnNames:["dim_language_id","level"]
                                          Statistics:Num rows: 18299732 Data 
size: 1903172224 Basic stats: COMPLETE Column stats: NONE
                                          TableScan [TS_0]
                                             alias:trapteam_fact_event_timeline
                                             Statistics:Num rows: 18299732 Data 
size: 1903172224 Basic stats: COMPLETE Column stats: NONE
{code}

> Hive on tez produce wrong results when withClause and (outer) joins
> -------------------------------------------------------------------
>
>                 Key: HIVE-13623
>                 URL: https://issues.apache.org/jira/browse/HIVE-13623
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.1.0, 2.0.0
>            Reporter: JinsuKim
>            Assignee: Sivashankar
>            Priority: Trivial
>
> Hive on mr produce correct results when (outer) joins and withClause. But tez 
> produce wrong results 
> {code:sql|title=Case1}
> with a as (
> select 1 as c1
> union all
> select 2 as c1
> union all
> select 3 as c1
> ),
> b as (
> select 1 as c1
> )
> select * 
>   from ( select a.c1 as ac1, b.c1 as bc1 
>          from a left outer join b 
>          on a.c1 = b.c1 
>        ) c;
>   where c.bc1 is null
> {code}
> {code:title=case1 result}
> mr :
> +--------+--------+--+
> | c.ac1  | c.bc1  |
> +--------+--------+--+
> | 2      | NULL   |
> | 3      | NULL   |
> +--------+--------+--+
> tez :
> +--------+--------+--+
> | c.ac1  | c.bc1  |
> +--------+--------+--+
> +--------+--------+--+
> {code}
> Case2 as similar to Case1 produces same results.
> {code:sql|title=Case2}
> with a as (
> select * from j1
> ),
> b as (
> select * from j2
> )
> select * 
>   from ( select a.c1 as ac1, b.c1 as bc1 
>          from a left outer join b 
>          on a.c1 = b.c1 
>        ) c
>   where c.bc1 is null;
>   
> drop table j1;
> create table j1 as select c1 from 
> ( select 1 as c1 from default.dual
> union all
> select 2 as c1 from default.dual
> union all
> select 3 as c1 from default.dual
> ) t1;
> drop table j2;
> create table j2 as
> select 1 as c1 from default.dual;
> {code}
> {code:title=case2 result}
> mr :
> +--------+--------+--+
> | c.ac1  | c.bc1  |
> +--------+--------+--+
> | 2      | NULL   |
> | 3      | NULL   |
> +--------+--------+--+
> tez :
> +--------+--------+--+
> | c.ac1  | c.bc1  |
> +--------+--------+--+
> | 2      | NULL   |
> | 3      | NULL   |
> +--------+--------+--+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13623) Hive on tez produce wrong results when withClause and (outer) joins

Reply via email to