[ 
https://issues.apache.org/jira/browse/HIVE-15474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15764900#comment-15764900
 ] 

Jesus Camacho Rodriguez commented on HIVE-15474:
------------------------------------------------

[~xuefuz], thanks for leaving the comment, it would be great if you could take 
a look at the patch too.

Propagating limit _N_ to GBy is valid iff GBy columns are a prefix of the OBy 
columns. This is due to the fact that GBy will not produce duplicates for those 
columns, while Hive implementation based on RS ensures that GBy output actually 
follows a certain order. Thus, we know that the GBy will output the top _N_ 
records.

I took a conservative approach as we need to be sure that we remain correct; it 
might be that the condition could be relaxed even further for some corner 
cases. However, we should not do it without double checking the theoretical 
background.

> Extend limit propagation for chain of RS-GB-RS operators
> --------------------------------------------------------
>
>                 Key: HIVE-15474
>                 URL: https://issues.apache.org/jira/browse/HIVE-15474
>             Project: Hive
>          Issue Type: Bug
>          Components: Physical Optimizer
>    Affects Versions: 2.2.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>         Attachments: HIVE-15474.patch
>
>
> The goal is to extend the work started in HIVE-14002.
> For instance, given the following query:
> {code:sql}
> explain
> select key, value, count(key + 1) as agg1 from src 
> group by key, value
> order by key, value, agg1 limit 20;
> {code}
> We can push the limit to the GBy operator. However, currently we do not do it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to