Hi,
We stumbled on some weird behaviour of mapjoin in hive 3.1
Sample schema:
> create table table_data(key int, a int);
> insert into table_data values (1, 1), (2, 2), (1, 3), (2, 4), (3, 5);
> create table table_dict(key int, b int);
> insert into table_dict values (1, 42), (2, 43);

Query:
>SELECT xs.key, dict.key, dict.b
>FROM table_data as xs
>LEFT JOIN table_dict as dict ON if((xs.key is null) or (xs.key = ''), 44,
xs.key) = dict.key;

returns wrong result when hive.auto.convert.join=true;
+---------+-----------+---------+
| xs.key  | dict.key  | dict.b  |
+---------+-----------+---------+
| 1       | 1         | 42      |
| 2       | 1         | 43      |
| 1       | 1         | 42      |
| 2       | 1         | 43      |
| 3       | 1         | NULL    |
+---------+-----------+---------

xs.key != dict.key (but they should be cause I join on them) while dict.b
values are right

when hive.auto.convert.join=false results are currect
+---------+-----------+---------+
| xs.key  | dict.key  | dict.b  |
+---------+-----------+---------+
| 1       | 1         | 42      |
| 1       | 1         | 42      |
| 2       | 2         | 43      |
| 2       | 2         | 43      |
| 3       | NULL      | NULL    |
+---------+-----------+---------+

It is definitely caused by if expression in ON.


-- 
Andrey

Reply via email to