[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971580#comment-14971580 ]
Sergey Shelukhin commented on HIVE-11531: ----------------------------------------- Do you have more specific questions? My guess is that for these optimizers, the first thing to do is to push the total (offset+limit) in place of old limit. I.e. if you select .... limit 10, 20, it would push down limit 30, and then Hive logic will discard 10 rows as usual. There is probably other optimization possible as step 2, i.e. not evaluating stuff for first 10 rows in this case, but it may be more difficult. For now, the simple step should suffice. > Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise > ----------------------------------------------------------------------------- > > Key: HIVE-11531 > URL: https://issues.apache.org/jira/browse/HIVE-11531 > Project: Hive > Issue Type: Improvement > Reporter: Sergey Shelukhin > Assignee: Hui Zheng > Attachments: HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch > > > For any UIs that involve pagination, it is useful to issue queries in the > form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be > paginated (which can be extremely large by itself). At present, ROW_NUMBER > can be used to achieve this effect, but optimizations for LIMIT such as TopN > in ReduceSink do not apply to ROW_NUMBER. We can add first class support for > "skip" to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)