[ https://issues.apache.org/jira/browse/HIVE-15474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15786788#comment-15786788 ]
Rui Li commented on HIVE-15474: ------------------------------- [~jcamachorodriguez], thanks very much for the detailed explanations :) For Spark, the operator chain is something like this: {{GBY1 – RS2 – GBY3 – RS4 – SEL5 – FS6}} Since RS2 can produce the top N keys, I think this optimization doesn't require the input to GBY3 to be sorted. I mean we still feed the top N keys to GBY3, but after shuffling, those keys may not be in a sorted order. And the result should remain correct. Is that right? > Extend limit propagation for chain of RS-GB-RS operators > -------------------------------------------------------- > > Key: HIVE-15474 > URL: https://issues.apache.org/jira/browse/HIVE-15474 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer > Affects Versions: 2.2.0 > Reporter: Jesus Camacho Rodriguez > Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-15474.patch > > > The goal is to extend the work started in HIVE-14002. > For instance, given the following query: > {code:sql} > explain > select key, value, count(key + 1) as agg1 from src > group by key, value > order by key, value, agg1 limit 20; > {code} > We generate the following physical plan: > {{TS1 - GBY2 - RS3 - GBY4 - RS5 - SEL6 - LIM7 - FS8}} > We can push the limit to RS3 operator, as we will generate records for the > _top N_ keys, and thus, GBY4 will produce the _top N_ results. However, > currently we do not do it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)