Re: hive 3.1 mapjoin with complex predicate produce incorrect results

Andrey Zinovyev Sat, 22 Dec 2018 06:36:19 -0800

Hi,
We use hive 3.1 from hdp 3.0.1
Here is explain output
+----------------------------------------------------+
|                      Explain                       |
+----------------------------------------------------+
| STAGE DEPENDENCIES:                                |
|   Stage-1 is a root stage                          |
|   Stage-0 depends on stages: Stage-1               |
|                                                    |
| STAGE PLANS:                                       |
|   Stage: Stage-1                                   |
|     Tez                                            |
|       DagId: hive_20181222172612_6c273743-d8c0-445e-808b-448f41f89965:879
|
|       Edges:                                       |
|         Map 1 <- Map 2 (BROADCAST_EDGE)            |
|       DagName:
hive_20181222172612_6c273743-d8c0-445e-808b-448f41f89965:879 |
|       Vertices:                                    |
|         Map 1                                      |
|             Map Operator Tree:                     |
|                 TableScan                          |
|                   alias: xs                        |
|                   Statistics: Num rows: 5 Data size: 20 Basic stats:
COMPLETE Column stats: COMPLETE |
|                   Select Operator                  |
|                     expressions: key (type: int)   |
|                     outputColumnNames: _col0       |
|                     Statistics: Num rows: 5 Data size: 20 Basic stats:
COMPLETE Column stats: COMPLETE |
|                     Map Join Operator              |
|                       condition map:               |
|                            Left Outer Join 0 to 1  |
|                       keys:                        |
|                         0 if(_col0 is null, 44, _col0) (type: int) |
|                         1 _col0 (type: int)        |
|                       outputColumnNames: _col0, _col1, _col2 |
|                       input vertices:              |
|                         1 Map 2                    |
|                       Statistics: Num rows: 5 Data size: 52 Basic stats:
COMPLETE Column stats: COMPLETE |
|                       File Output Operator         |
|                         compressed: false          |
|                         Statistics: Num rows: 5 Data size: 52 Basic
stats: COMPLETE Column stats: COMPLETE |
|                         table:                     |
|                             input format:
org.apache.hadoop.mapred.SequenceFileInputFormat |
|                             output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat |
|                             serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
|             Execution mode: vectorized             |
|         Map 2                                      |
|             Map Operator Tree:                     |
|                 TableScan                          |
|                   alias: dict                      |
|                   Statistics: Num rows: 2 Data size: 16 Basic stats:
COMPLETE Column stats: COMPLETE |
|                   Select Operator                  |
|                     expressions: key (type: int), b (type: int) |
|                     outputColumnNames: _col0, _col1 |
|                     Statistics: Num rows: 2 Data size: 16 Basic stats:
COMPLETE Column stats: COMPLETE |
|                     Reduce Output Operator         |
|                       key expressions: _col0 (type: int) |
|                       sort order: +                |
|                       Map-reduce partition columns: _col0 (type: int) |
|                       Statistics: Num rows: 2 Data size: 16 Basic stats:
COMPLETE Column stats: COMPLETE |
|                       value expressions: _col1 (type: int) |
|             Execution mode: vectorized             |
|                                                    |
|   Stage: Stage-0                                   |
|     Fetch Operator                                 |
|       limit: -1                                    |
|       Processor Tree:                              |
|         ListSink                                   |
|                                                    |
+----------------------------------------------------+


On Fri, Dec 21, 2018 at 8:39 PM Vineet Garg <vg...@hortonworks.com> wrote:

> Hi Andrey,
>
> I tried this on latest master and wasn’t able to reproduce. Would you mind
> sharing explain plan output? (after setting hive.user.explain = false).
>
> Vineet
>
> > On Dec 20, 2018, at 11:37 AM, Andrey Zinovyev <andrey.zinov...@gmail.com>
> wrote:
> >
> > Hi,
> > We stumbled on some weird behaviour of mapjoin in hive 3.1
> > Sample schema:
> > > create table table_data(key int, a int);
> > > insert into table_data values (1, 1), (2, 2), (1, 3), (2, 4), (3, 5);
> > > create table table_dict(key int, b int);
> > > insert into table_dict values (1, 42), (2, 43);
> >
> > Query:
> > >SELECT xs.key, dict.key, dict.b
> > >FROM table_data as xs
> > >LEFT JOIN table_dict as dict ON if((xs.key is null) or (xs.key = ''),
> 44, xs.key) = dict.key;
> >
> > returns wrong result when hive.auto.convert.join=true;
> > +---------+-----------+---------+
> > | xs.key  | dict.key  | dict.b  |
> > +---------+-----------+---------+
> > | 1       | 1         | 42      |
> > | 2       | 1         | 43      |
> > | 1       | 1         | 42      |
> > | 2       | 1         | 43      |
> > | 3       | 1         | NULL    |
> > +---------+-----------+---------
> >
> > xs.key != dict.key (but they should be cause I join on them) while
> dict.b values are right
> >
> > when hive.auto.convert.join=false results are currect
> > +---------+-----------+---------+
> > | xs.key  | dict.key  | dict.b  |
> > +---------+-----------+---------+
> > | 1       | 1         | 42      |
> > | 1       | 1         | 42      |
> > | 2       | 2         | 43      |
> > | 2       | 2         | 43      |
> > | 3       | NULL      | NULL    |
> > +---------+-----------+---------+
> >
> > It is definitely caused by if expression in ON.
> >
> >
> > --
> > Andrey
>
>

-- 
С уважением
Зиновьев Андрей

Re: hive 3.1 mapjoin with complex predicate produce incorrect results

Reply via email to