The result of left semi join is not correct
-------------------------------------------

                 Key: HIVE-1723
                 URL: https://issues.apache.org/jira/browse/HIVE-1723
             Project: Hadoop Hive
          Issue Type: Bug
            Reporter: Liyin Tang


In the test case semijoin.q, there is a query:
select /*+ mapjoin(b) */ a.key from t3 a left semi join t1 b on a.key = b.key 
sort by a.key;
I think this query will return a wrong result if table t1 is larger than 25000 
different keys

To be simple, I tried a very similar query:
select /*+ mapjoin(b) */ a.key from test_semijoin a left semi join 
test_semijoin b on a.key = b.key sort by a.key;
The table of test_semijoin is like
0     0
1     1
2     2
3     3
4     4
5     5
...    ...

...          ....
25000   25000
25001   25001
...          ....
...          ....
25999   25999
26000   26000


So we can easily estimate the correct result of this query should be the same 
keys from table test_semijoin itsel.
Actually, the result is only part of that: only from 0 to 24544.

0
1
2
..
..
24543
24544



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to