[ 
https://issues.apache.org/jira/browse/HIVE-24199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24199:
-------------------------------------------
    Fix Version/s: 4.0.0

> Incorrect result when subquey in exists contains limit
> ------------------------------------------------------
>
>                 Key: HIVE-24199
>                 URL: https://issues.apache.org/jira/browse/HIVE-24199
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Krisztian Kasa
>            Assignee: Krisztian Kasa
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> {code:java}
> create table web_sales (ws_order_number int, ws_warehouse_sk int) stored as 
> orc;
> insert into web_sales values
> (1, 1),
> (1, 2),
> (2, 1),
> (2, 2);
> select * from web_sales ws1
> where exists (select 1 from web_sales ws2 where ws1.ws_order_number = 
> ws2.ws_order_number limit 1);
> 1     1
> 1     2
> {code}
> {code:java}
> CBO PLAN:
> HiveSemiJoin(condition=[=($0, $2)], joinType=[semi])
>   HiveProject(ws_order_number=[$0], ws_warehouse_sk=[$1])
>     HiveFilter(condition=[IS NOT NULL($0)])
>       HiveTableScan(table=[[default, web_sales]], table:alias=[ws1])
>   HiveProject(ws_order_number=[$0])
>     HiveSortLimit(fetch=[1])          <-- This shouldn't be added
>       HiveProject(ws_order_number=[$0])
>         HiveFilter(condition=[IS NOT NULL($0)])
>           HiveTableScan(table=[[default, web_sales]], table:alias=[ws2])
> {code}
> Limit n on the right side of the join reduces the result set coming from the 
> right to only n record hence not all the ws_order_number values are included 
> which leads to correctness issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to