Hi, We use hive 3.1 from hdp 3.0.1 Here is explain output +----------------------------------------------------+ | Explain | +----------------------------------------------------+ | STAGE DEPENDENCIES: | | Stage-1 is a root stage | | Stage-0 depends on stages: Stage-1 | | | | STAGE PLANS: | | Stage: Stage-1 | | Tez | | DagId: hive_20181222172612_6c273743-d8c0-445e-808b-448f41f89965:879 | | Edges: | | Map 1 <- Map 2 (BROADCAST_EDGE) | | DagName: hive_20181222172612_6c273743-d8c0-445e-808b-448f41f89965:879 | | Vertices: | | Map 1 | | Map Operator Tree: | | TableScan | | alias: xs | | Statistics: Num rows: 5 Data size: 20 Basic stats: COMPLETE Column stats: COMPLETE | | Select Operator | | expressions: key (type: int) | | outputColumnNames: _col0 | | Statistics: Num rows: 5 Data size: 20 Basic stats: COMPLETE Column stats: COMPLETE | | Map Join Operator | | condition map: | | Left Outer Join 0 to 1 | | keys: | | 0 if(_col0 is null, 44, _col0) (type: int) | | 1 _col0 (type: int) | | outputColumnNames: _col0, _col1, _col2 | | input vertices: | | 1 Map 2 | | Statistics: Num rows: 5 Data size: 52 Basic stats: COMPLETE Column stats: COMPLETE | | File Output Operator | | compressed: false | | Statistics: Num rows: 5 Data size: 52 Basic stats: COMPLETE Column stats: COMPLETE | | table: | | input format: org.apache.hadoop.mapred.SequenceFileInputFormat | | output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat | | serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | Execution mode: vectorized | | Map 2 | | Map Operator Tree: | | TableScan | | alias: dict | | Statistics: Num rows: 2 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE | | Select Operator | | expressions: key (type: int), b (type: int) | | outputColumnNames: _col0, _col1 | | Statistics: Num rows: 2 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE | | Reduce Output Operator | | key expressions: _col0 (type: int) | | sort order: + | | Map-reduce partition columns: _col0 (type: int) | | Statistics: Num rows: 2 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE | | value expressions: _col1 (type: int) | | Execution mode: vectorized | | | | Stage: Stage-0 | | Fetch Operator | | limit: -1 | | Processor Tree: | | ListSink | | | +----------------------------------------------------+
On Fri, Dec 21, 2018 at 8:39 PM Vineet Garg <vg...@hortonworks.com> wrote: > Hi Andrey, > > I tried this on latest master and wasn’t able to reproduce. Would you mind > sharing explain plan output? (after setting hive.user.explain = false). > > Vineet > > > On Dec 20, 2018, at 11:37 AM, Andrey Zinovyev <andrey.zinov...@gmail.com> > wrote: > > > > Hi, > > We stumbled on some weird behaviour of mapjoin in hive 3.1 > > Sample schema: > > > create table table_data(key int, a int); > > > insert into table_data values (1, 1), (2, 2), (1, 3), (2, 4), (3, 5); > > > create table table_dict(key int, b int); > > > insert into table_dict values (1, 42), (2, 43); > > > > Query: > > >SELECT xs.key, dict.key, dict.b > > >FROM table_data as xs > > >LEFT JOIN table_dict as dict ON if((xs.key is null) or (xs.key = ''), > 44, xs.key) = dict.key; > > > > returns wrong result when hive.auto.convert.join=true; > > +---------+-----------+---------+ > > | xs.key | dict.key | dict.b | > > +---------+-----------+---------+ > > | 1 | 1 | 42 | > > | 2 | 1 | 43 | > > | 1 | 1 | 42 | > > | 2 | 1 | 43 | > > | 3 | 1 | NULL | > > +---------+-----------+--------- > > > > xs.key != dict.key (but they should be cause I join on them) while > dict.b values are right > > > > when hive.auto.convert.join=false results are currect > > +---------+-----------+---------+ > > | xs.key | dict.key | dict.b | > > +---------+-----------+---------+ > > | 1 | 1 | 42 | > > | 1 | 1 | 42 | > > | 2 | 2 | 43 | > > | 2 | 2 | 43 | > > | 3 | NULL | NULL | > > +---------+-----------+---------+ > > > > It is definitely caused by if expression in ON. > > > > > > -- > > Andrey > > -- С уважением Зиновьев Андрей