[jira] [Commented] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results

Hudson (JIRA) Fri, 02 Aug 2013 13:18:36 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13728042#comment-13728042
 ]


Hudson commented on HIVE-4952:
------------------------------

ABORTED: Integrated in Hive-trunk-hadoop2 #322 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/322/])
HIVE-4952 : When hive.join.emit.interval is small, queries optimized by 
Correlation Optimizer may generate wrong results (Yin Huai via Ashutosh 
Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1509542)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java
* /hive/trunk/ql/src/test/queries/clientpositive/correlationoptimizer15.q
* /hive/trunk/ql/src/test/results/clientpositive/correlationoptimizer15.q.out

                
> When hive.join.emit.interval is small, queries optimized by Correlation 
> Optimizer may generate wrong results
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-4952
>                 URL: https://issues.apache.org/jira/browse/HIVE-4952
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.12.0
>            Reporter: Yin Huai
>            Assignee: Yin Huai
>             Fix For: 0.12.0
>
>         Attachments: HIVE-4952.D11889.1.patch, HIVE-4952.D11889.2.patch, 
> replay.txt
>
>
> If we have a query like this ...
> {code:sql}
> SELECT xx.key, xx.cnt, yy.key
> FROM
> (SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key = 
> y.key) group by x.key) xx
> JOIN src yy
> ON xx.key=yy.key;
> {\code}
> After Correlation Optimizer, the operator tree in the reducer will be 
> {code}
>      JOIN2
>        |
>        |
>       MUX
>      /   \
>     /     \
>    GBY     |
>     |      |
>   JOIN1    |
>     \     /
>      \   /
>      DEMUX
> {\code}
> For JOIN2, the right table will arrive at this operator first. If 
> hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even 
> it has not got any row from the left table. The logic related 
> hive.join.emit.interval in JoinOperator assumes that inputs will be ordered 
> by the tag. But, if a query has been optimized by Correlation Optimizer, this 
> assumption may not hold for those JoinOperators inside the reducer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results

Reply via email to