[
https://issues.apache.org/jira/browse/HIVE-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13696736#comment-13696736
]
Hudson commented on HIVE-4781:
------------------------------
Integrated in Hive-trunk-hadoop2 #266 (See
[https://builds.apache.org/job/Hive-trunk-hadoop2/266/])
HIVE-4781 : LEFT SEMI JOIN generates wrong results when the number of rows
belonging to a single key of the right table exceed hive.join.emit.interval
(Yin Huai via Ashutosh Chauhan) (Revision 1498150)
Result = FAILURE
hashutosh :
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1498150
Files :
* /hive/trunk/build-common.xml
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java
* /hive/trunk/ql/src/test/queries/clientpositive/leftsemijoin_mr.q
* /hive/trunk/ql/src/test/results/clientpositive/leftsemijoin_mr.q.out
> LEFT SEMI JOIN generates wrong results when the number of rows belonging to a
> single key of the right table exceed hive.join.emit.interval
> ------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-4781
> URL: https://issues.apache.org/jira/browse/HIVE-4781
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.12.0
> Reporter: Yin Huai
> Assignee: Yin Huai
> Fix For: 0.12.0
>
> Attachments: HIVE-4781.txt, wrong_semi_join.txt
>
>
> Suppose that we have a query shown below
> {code:sql}
> SELECT key FROM t1 LEFT SEMI JOIN t2 ON (t1.key=t2.key);
> {\code}
> When the number of rows of t2 is larger than hive.join.emit.interval,
> JoinOperator will emit rows from t1, which will result in redundant output.
> Let's say t1 is
> {code}
> 1
> {\code}
> and t2 is
> {code}
> 1
> 1
> 1
> 1
> {\code}
> When hive.join.emit.interval=1, the output of above query will be
> {code}
> 1
> 1
> 1
> 1
> {\code}
> The correct result should be
> {code}
> 1
> {\code}
> This problem cannot be found in unit test. Because there is a GBY operator
> inserted before JoinOperator and we have only 1 mapper, the output of map
> phase only has distinct keys.
> Please apply the patch 'wrong_semi_join.txt' attached below and use
> {code}
> ant test -Dtestcase=TestMinimrCliDriver -Dqfile="left_semi_join.q"
> -Dtest.silent=false
> {\code} to replay the problem. The wrong result can be found in
> {code}
> <hive_root_dir>/build/ql/test/logs/clientpositive
> {\code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira