[
https://issues.apache.org/jira/browse/HIVE-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Carl Steinbach resolved HIVE-1723.
----------------------------------
Resolution: Duplicate
> The result of left semi join is not correct
> -------------------------------------------
>
> Key: HIVE-1723
> URL: https://issues.apache.org/jira/browse/HIVE-1723
> Project: Hive
> Issue Type: Bug
> Reporter: Liyin Tang
> Assignee: Liyin Tang
>
> In the test case semijoin.q, there is a query:
> select /*+ mapjoin(b) */ a.key from t3 a left semi join t1 b on a.key = b.key
> sort by a.key;
> I think this query will return a wrong result if table t1 is larger than
> 25000 different keys
> To be simple, I tried a very similar query:
> select /*+ mapjoin(b) */ a.key from test_semijoin a left semi join
> test_semijoin b on a.key = b.key sort by a.key;
> The table of test_semijoin is like
> 0 0
> 1 1
> 2 2
> 3 3
> 4 4
> 5 5
> ... ...
> ... ....
> 25000 25000
> 25001 25001
> ... ....
> ... ....
> 25999 25999
> 26000 26000
> So we can easily estimate the correct result of this query should be the same
> keys from table test_semijoin itsel.
> Actually, the result is only part of that: only from 0 to 24544.
> 0
> 1
> 2
> ..
> ..
> 24543
> 24544
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira