[ https://issues.apache.org/jira/browse/FLINK-11943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jark Wu updated FLINK-11943: ---------------------------- Summary: Support TopN feature for SQL (was: Support TopN feature for Flink SQL) > Support TopN feature for SQL > ---------------------------- > > Key: FLINK-11943 > URL: https://issues.apache.org/jira/browse/FLINK-11943 > Project: Flink > Issue Type: New Feature > Components: API / Table SQL > Reporter: Jark Wu > Priority: Major > > TopN is a frequently used feature in data analysis. We can use ORDER BY + > LIMIT to easily express a TopN query, e.g. {{SELECT * FROM T ORDER BY amount > DESC LIMIT 10}}. > But this is a global TopN, there is a great requirement for per-group TopN. > For example, top 10 shops for each category. In order to avoid introducing > new syntax for this, we would like to use traditional syntax to express it by > using {{ROW_NUMBER}} over window + {{FILTER}} to limit the numbers. > For example: > SELECT * > FROM ( > SELECT category, shopId, sales, > [ROW_NUMBER()|RANK()|DENSE_RANK()] OVER > (PARTITION BY category ORDER BY sales ASC) as rownum > FROM shop_sales > ) > WHERE rownum <= 10 > This issue is aiming to optimize this query to an {{Rank}} node instead of > {{Over}} plus {{Calc}}. And translate the {{Rank}} node into physical > operators. > There are some optimization for rank operator based on the different input of > the Rank. We would like to implement the basic and one-fit-all > implementation. And do the performance improvement later. > Here is a brief design doc: > https://docs.google.com/document/d/14JCV6X6hcpoA51loprgntZNxQ2NmnDLucxgGY8xVDuI/edit# -- This message was sent by Atlassian JIRA (v7.6.3#76005)