[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

Sergey Shelukhin (JIRA) Fri, 23 Oct 2015 11:43:04 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971580#comment-14971580
 ]


Sergey Shelukhin commented on HIVE-11531:
-----------------------------------------

Do you have more specific questions? My guess is that for these optimizers, the 
first thing to do is to push the total (offset+limit) in place of old limit. 
I.e. if you select .... limit 10, 20, it would push down limit 30, and then 
Hive logic will discard 10 rows as usual. There is probably other optimization 
possible as step 2, i.e. not evaluating stuff for first 10 rows in this case, 
but it may be more difficult. For now, the simple step should suffice.

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-11531
>                 URL: https://issues.apache.org/jira/browse/HIVE-11531
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Sergey Shelukhin
>            Assignee: Hui Zheng
>         Attachments: HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

Reply via email to