[ https://issues.apache.org/jira/browse/HIVE-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922231#action_12922231 ]
Namit Jain commented on HIVE-1723: ---------------------------------- Can you add a test for the same ? Set hive.mapjoin.cache.numrows to a lower number, it is set to 25000 currently. > The result of left semi join is not correct > ------------------------------------------- > > Key: HIVE-1723 > URL: https://issues.apache.org/jira/browse/HIVE-1723 > Project: Hadoop Hive > Issue Type: Bug > Reporter: Liyin Tang > Assignee: Liyin Tang > > In the test case semijoin.q, there is a query: > select /*+ mapjoin(b) */ a.key from t3 a left semi join t1 b on a.key = b.key > sort by a.key; > I think this query will return a wrong result if table t1 is larger than > 25000 different keys > To be simple, I tried a very similar query: > select /*+ mapjoin(b) */ a.key from test_semijoin a left semi join > test_semijoin b on a.key = b.key sort by a.key; > The table of test_semijoin is like > 0 0 > 1 1 > 2 2 > 3 3 > 4 4 > 5 5 > ... ... > ... .... > 25000 25000 > 25001 25001 > ... .... > ... .... > 25999 25999 > 26000 26000 > So we can easily estimate the correct result of this query should be the same > keys from table test_semijoin itsel. > Actually, the result is only part of that: only from 0 to 24544. > 0 > 1 > 2 > .. > .. > 24543 > 24544 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.