[ 
https://issues.apache.org/jira/browse/HIVE-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-5973:
---------------------------------

    Attachment: HIVE-5973.1.patch

Attached is the test and a fix. The problem occurs when the small table is 
bucketed and partitioned and has a select sub-query. The select operator that 
is introduced as part of the sub-query causes the issue described.

Thanks to [~rhbutani] for helping with the solution and test case. It looks 
like the right way to run these type of tests is via the MinimrCliDriver as the 
CliDriver tests mask the issue by having a single reducer resulting in 
incorrect bucketing.

> SMB joins produce incorrect results with multiple partitions and buckets
> ------------------------------------------------------------------------
>
>                 Key: HIVE-5973
>                 URL: https://issues.apache.org/jira/browse/HIVE-5973
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.13.0
>            Reporter: Vikram Dixit K
>            Assignee: Vikram Dixit K
>             Fix For: 0.13.0
>
>         Attachments: HIVE-5973.1.patch
>
>
> It looks like there is an issue with re-using the output object array in the 
> select operator. When we read rows of the non-big tables, we hold on to the 
> output object in the priority queue. This causes hive to produce incorrect 
> results because all the elements in the priority queue refer to the same 
> object and the join happens on only one of the buckets.
> {noformat}
> output[i] = eval[i].evaluate(row);
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to