[jira] [Commented] (HIVE-5358) ReduceSinkDeDuplication should ignore column orders when check overlapping part of keys between parent and child

Ashutosh Chauhan (JIRA) Thu, 26 Sep 2013 08:31:17 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13778898#comment-13778898
 ]


Ashutosh Chauhan commented on HIVE-5358:
----------------------------------------

For queries like:
{code}
select key, value from (select key, value from src group by key, value) t order 
by value, key;
select key, value from (select key, value from src order by key, value) t group 
by value, key;
{code}

In these case also two RSs can be merged but order of key Columns on RS becomes 
important and must be preserved. Can you add a test case for this to make sure 
this works correctly. 

For queries like:
{code}
set hive.optimize.correlation=true;   
select key, value from (select s1.key, s2.value from src s1 join src s2 on 
s1.key = s2.key and s1.value=s2.value) t group by  key,value;
select key, value from (select s1.key, s2.value from src s1 join src s2 on 
s1.key = s2.key and s1.value=s2.value) t group by  value,key;
{code}
For these cases also, ordering of key columns in RS is important.

cc: [~navis] [~yhuai]
                
> ReduceSinkDeDuplication should ignore column orders when check overlapping 
> part of keys between parent and child
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-5358
>                 URL: https://issues.apache.org/jira/browse/HIVE-5358
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Chun Chen
>            Assignee: Chun Chen
>         Attachments: D13113.1.patch, HIVE-5358.2.patch, HIVE-5358.patch
>
>
> {code}
> select key, value from (select key, value from src group by key, value) t 
> group by key, value;
> {code}
> This can be optimized by ReduceSinkDeDuplication
> {code}
> select key, value from (select key, value from src group by key, value) t 
> group by value, key;
> {code}
> However the sql above can't be optimized by ReduceSinkDeDuplication currently 
> due to different column orders of parent and child operator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5358) ReduceSinkDeDuplication should ignore column orders when check overlapping part of keys between parent and child

Reply via email to