[ https://issues.apache.org/jira/browse/HIVE-24804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
László Bodor updated HIVE-24804: -------------------------------- Summary: Introduce check: RANGE with offset PRECEDING/FOLLOWING requires at least one ORDER BY column (was: Introduce check: RANGE with offset PRECEDING/FOLLOWING requires exactly one ORDER BY column) > Introduce check: RANGE with offset PRECEDING/FOLLOWING requires at least one > ORDER BY column > -------------------------------------------------------------------------------------------- > > Key: HIVE-24804 > URL: https://issues.apache.org/jira/browse/HIVE-24804 > Project: Hive > Issue Type: Bug > Reporter: László Bodor > Assignee: László Bodor > Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Currently, in Hive, we can run a windowing function with range specification > but without an ORDER BY clause: > {code} > create table vector_ptf_part_simple_text(p_mfgr string, p_name string, > p_retailprice double, rowindex string); > select p_mfgr, p_name, rowindex, > count(*) over(partition by p_mfgr range between 1 preceding and current row) > as cs1, > count(*) over(partition by p_mfgr range between 3 preceding and current row) > as cs2 > from vector_ptf_part_simple_text; > {code} > This is confusing, because without an order by clause, the range is out of > context, we don't know by which column should we calculate the range. > Tested on Postgres, it throws an exception: > {code} > create table vector_ptf_part_simple_text(p_mfgr varchar(10), p_name > varchar(10), p_retailprice integer, rowindex varchar(10)); > select p_mfgr, p_name, rowindex, > count(*) over(partition by p_mfgr range between 1 preceding and current row) > as cs1, > count(*) over(partition by p_mfgr range between 3 preceding and current row) > as cs2 > from vector_ptf_part_simple_text; > *RANGE with offset PRECEDING/FOLLOWING requires exactly one ORDER BY column* > {code} > further references: > https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts > {code} > RANGE: Computes the window frame based on a logical range of rows around the > current row, based on the current row’s ORDER BY key value. The provided > range value is added or subtracted to the current row's key value to define a > starting or ending range boundary for the window frame. In a range-based > window frame, there must be exactly one expression in the ORDER BY clause, > and the expression must have a numeric type. > {code} > https://docs.oracle.com/cd/E17952_01/mysql-8.0-en/window-functions-frames.html > {code} > Without ORDER BY: The default frame includes all partition rows (because, > without ORDER BY, all partition rows are peers). The default is equivalent to > this frame specification: > RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING > {code} > I believe this one could only make sense if you don't specify range, > otherwise the sql statement reflects a different thing from which is returned > by the engine -- This message was sent by Atlassian Jira (v8.3.4#803005)