[jira] Commented: (HIVE-1723) The result of left semi join is not correct

Namit Jain (JIRA) Mon, 18 Oct 2010 13:06:47 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922231#action_12922231
 ]


Namit Jain commented on HIVE-1723:
----------------------------------

Can you add a test for the same ?
Set hive.mapjoin.cache.numrows to a lower number, it is set to 25000 currently.


> The result of left semi join is not correct
> -------------------------------------------
>
>                 Key: HIVE-1723
>                 URL: https://issues.apache.org/jira/browse/HIVE-1723
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>
> In the test case semijoin.q, there is a query:
> select /*+ mapjoin(b) */ a.key from t3 a left semi join t1 b on a.key = b.key 
> sort by a.key;
> I think this query will return a wrong result if table t1 is larger than 
> 25000 different keys
> To be simple, I tried a very similar query:
> select /*+ mapjoin(b) */ a.key from test_semijoin a left semi join 
> test_semijoin b on a.key = b.key sort by a.key;
> The table of test_semijoin is like
> 0     0
> 1     1
> 2     2
> 3     3
> 4     4
> 5     5
> ...    ...
> ...          ....
> 25000   25000
> 25001   25001
> ...          ....
> ...          ....
> 25999   25999
> 26000   26000
> So we can easily estimate the correct result of this query should be the same 
> keys from table test_semijoin itsel.
> Actually, the result is only part of that: only from 0 to 24544.
> 0
> 1
> 2
> ..
> ..
> 24543
> 24544

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1723) The result of left semi join is not correct

Reply via email to