Hi, We stumbled on some weird behaviour of mapjoin in hive 3.1 Sample schema: > create table table_data(key int, a int); > insert into table_data values (1, 1), (2, 2), (1, 3), (2, 4), (3, 5); > create table table_dict(key int, b int); > insert into table_dict values (1, 42), (2, 43);
Query: >SELECT xs.key, dict.key, dict.b >FROM table_data as xs >LEFT JOIN table_dict as dict ON if((xs.key is null) or (xs.key = ''), 44, xs.key) = dict.key; returns wrong result when hive.auto.convert.join=true; +---------+-----------+---------+ | xs.key | dict.key | dict.b | +---------+-----------+---------+ | 1 | 1 | 42 | | 2 | 1 | 43 | | 1 | 1 | 42 | | 2 | 1 | 43 | | 3 | 1 | NULL | +---------+-----------+--------- xs.key != dict.key (but they should be cause I join on them) while dict.b values are right when hive.auto.convert.join=false results are currect +---------+-----------+---------+ | xs.key | dict.key | dict.b | +---------+-----------+---------+ | 1 | 1 | 42 | | 1 | 1 | 42 | | 2 | 2 | 43 | | 2 | 2 | 43 | | 3 | NULL | NULL | +---------+-----------+---------+ It is definitely caused by if expression in ON. -- Andrey