[ https://issues.apache.org/jira/browse/HIVE-15474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15764900#comment-15764900 ]
Jesus Camacho Rodriguez commented on HIVE-15474: ------------------------------------------------ [~xuefuz], thanks for leaving the comment, it would be great if you could take a look at the patch too. Propagating limit _N_ to GBy is valid iff GBy columns are a prefix of the OBy columns. This is due to the fact that GBy will not produce duplicates for those columns, while Hive implementation based on RS ensures that GBy output actually follows a certain order. Thus, we know that the GBy will output the top _N_ records. I took a conservative approach as we need to be sure that we remain correct; it might be that the condition could be relaxed even further for some corner cases. However, we should not do it without double checking the theoretical background. > Extend limit propagation for chain of RS-GB-RS operators > -------------------------------------------------------- > > Key: HIVE-15474 > URL: https://issues.apache.org/jira/browse/HIVE-15474 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer > Affects Versions: 2.2.0 > Reporter: Jesus Camacho Rodriguez > Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-15474.patch > > > The goal is to extend the work started in HIVE-14002. > For instance, given the following query: > {code:sql} > explain > select key, value, count(key + 1) as agg1 from src > group by key, value > order by key, value, agg1 limit 20; > {code} > We can push the limit to the GBy operator. However, currently we do not do it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)