[jira] [Commented] (HIVE-11604) HIVE return wrong results in some queries with PTF function

Yongzhi Chen (JIRA) Thu, 20 Aug 2015 19:12:06 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706117#comment-14706117
 ]


Yongzhi Chen commented on HIVE-11604:
-------------------------------------

This issue has a workaround: set hive.optimize.ppd=false
Comparing query plan when ppd on/off, the major difference is a SelectOperator 
following PTF Operator when ppd is off while there is no SelectOperator when 
ppd is on. 
Debugging the code: Originally ...PTF---> SEL..., ppd push a filter between 
them ...PTF-->FIL-->SEL; After that, many optimizers work on the SEL(for 
example, ColumnPruner, SelectDecup ...) Then, IdentityProjectRemover remove the 
SEL because it is an IdentitySelect. After the SEL is removed, the plan is: PTF 
-->FIL-->File Output Operator . PTF does not have right column output sequence, 
then later cause join return wrong result. We should not remove the first SEL 
following the PTF operator. Make the code change accordingly.

> HIVE return wrong results in some queries with PTF function
> -----------------------------------------------------------
>
>                 Key: HIVE-11604
>                 URL: https://issues.apache.org/jira/browse/HIVE-11604
>             Project: Hive
>          Issue Type: Bug
>          Components: Logical Optimizer
>    Affects Versions: 1.2.0, 1.1.0
>            Reporter: Yongzhi Chen
>            Assignee: Yongzhi Chen
>
> Following query returns empty result which is not right:
> {noformat}
> select ddd.id, ddd.fkey, aaa.name
> from (
>     select id, fkey, 
>     row_number() over (partition by id, fkey) as rnum
>     from tlb1 group by id, fkey
>  ) ddd 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> {noformat}
> After remove row_number() over (partition by id, fkey) as rnum from query, 
> the right result returns.
> Reproduce:
> {noformat}
> create table tlb1 (id int, fkey int, val string);
> create table tlb2 (fid int, name string);
> insert into table tlb1 values(100,1,'abc');
> insert into table tlb1 values(200,1,'efg');
> insert into table tlb2 values(1, 'key1');
> select ddd.id, ddd.fkey, aaa.name
> from (
>     select id, fkey, 
>     row_number() over (partition by id, fkey) as rnum
>     from tlb1 group by id, fkey
>  ) ddd 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> ....
> INFO  : Ended Job = job_local1070163923_0017
> +---------+-----------+-----------+--+
> No rows selected (14.248 seconds)
> | ddd.id  | ddd.fkey  | aaa.name  |
> +---------+-----------+-----------+--+
> +---------+-----------+-----------+--+
> 0: jdbc:hive2://localhost:10000> select ddd.id, ddd.fkey, aaa.name
> from (
>     select id, fkey 
>     from tlb1 group by id, fkey
>  ) ddd 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name
> 0: jdbc:hive2://localhost:10000> from (
> 0: jdbc:hive2://localhost:10000>     select id, fkey 
> 0: jdbc:hive2://localhost:10000>     from tlb1 group by id, fkey
> 0: jdbc:hive2://localhost:10000>  ) ddd 
> 0: jdbc:hive2://localhost:10000> 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> INFO  : Number of reduce tasks not specified. Estimated from input data size: 
> 1
> ...
> INFO  : Ended Job = job_local672340505_0019
> +---------+-----------+-----------+--+
> 2 rows selected (14.383 seconds)
> | ddd.id  | ddd.fkey  | aaa.name  |
> +---------+-----------+-----------+--+
> | 100     | 1         | key1      |
> | 200     | 1         | key1      |
> +---------+-----------+-----------+--+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11604) HIVE return wrong results in some queries with PTF function

Reply via email to