[jira] [Updated] (HIVE-24804) RANGE with offset PRECEDING/FOLLOWING requires exactly one ORDER BY column

Jira Mon, 22 Feb 2021 02:17:11 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-24804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


László Bodor updated HIVE-24804:
--------------------------------
    Description: 
Currently, in Hive, we can run a windowing function with range specification 
but without an ORDER BY clause:
{code}
create table vector_ptf_part_simple_text(p_mfgr string, p_name string, 
p_retailprice double, rowindex string);
select p_mfgr, p_name, rowindex,
count(*) over(partition by p_mfgr range between 1 preceding and current row) as 
cs1,
count(*) over(partition by p_mfgr range between 3 preceding and current row) as 
cs2
from vector_ptf_part_simple_text;
{code}

This is confusing, becuase without an order by clause, the range is out of 
context, we don't know by which column should we calculate the range.

Tested on Postgres, it throws an exception:
{code}
create table vector_ptf_part_simple_text(p_mfgr varchar(10), p_name 
varchar(10), p_retailprice integer, rowindex varchar(10));

select p_mfgr, p_name, rowindex,
count(*) over(partition by p_mfgr range between 1 preceding and current row) as 
cs1,
count(*) over(partition by p_mfgr range between 3 preceding and current row) as 
cs2
from vector_ptf_part_simple_text;

*RANGE with offset PRECEDING/FOLLOWING requires exactly one ORDER BY column*
{code}



further references:
https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts
{code}
RANGE: Computes the window frame based on a logical range of rows around the 
current row, based on the current row’s ORDER BY key value. The provided range 
value is added or subtracted to the current row's key value to define a 
starting or ending range boundary for the window frame. In a range-based window 
frame, there must be exactly one expression in the ORDER BY clause, and the 
expression must have a numeric type.
{code}

https://docs.oracle.com/cd/E17952_01/mysql-8.0-en/window-functions-frames.html
{code}
Without ORDER BY: The default frame includes all partition rows (because, 
without ORDER BY, all partition rows are peers). The default is equivalent to 
this frame specification:
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
{code}
I believe this one could only make sense if you don't specify range, otherwise 
the sql statement reflects a different thing from which is returned by the 
engine

  was:
Currently, in Hive, we can run a windowing function with range specification 
but without an ORDER BY clause:
{code}
create table vector_ptf_part_simple_text(p_mfgr string, p_name string, 
p_retailprice double, rowindex string);
select p_mfgr, p_name, rowindex,
count(*) over(partition by p_mfgr range between 1 preceding and current row) as 
cs1,
count(*) over(partition by p_mfgr range between 3 preceding and current row) as 
cs2
from vector_ptf_part_simple_text;
{code}

This is confusing, becuase without an order by clause, the range is is out of 
context, we don't know by which column should we calculate the range.

Tested on Postgres, it throws an exception:
{code}
create table vector_ptf_part_simple_text(p_mfgr varchar(10), p_name 
varchar(10), p_retailprice integer, rowindex varchar(10));

select p_mfgr, p_name, rowindex,
count(*) over(partition by p_mfgr range between 1 preceding and current row) as 
cs1,
count(*) over(partition by p_mfgr range between 3 preceding and current row) as 
cs2
from vector_ptf_part_simple_text;

*RANGE with offset PRECEDING/FOLLOWING requires exactly one ORDER BY column*
{code}



further references:
https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts
{code}
RANGE: Computes the window frame based on a logical range of rows around the 
current row, based on the current row’s ORDER BY key value. The provided range 
value is added or subtracted to the current row's key value to define a 
starting or ending range boundary for the window frame. In a range-based window 
frame, there must be exactly one expression in the ORDER BY clause, and the 
expression must have a numeric type.
{code}

https://docs.oracle.com/cd/E17952_01/mysql-8.0-en/window-functions-frames.html
{code}
Without ORDER BY: The default frame includes all partition rows (because, 
without ORDER BY, all partition rows are peers). The default is equivalent to 
this frame specification:
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
{code}
I believe this one could only make sense if you don't specify range, otherwise 
the sql statement reflects a different thing from which is returned by the 
engine


> RANGE with offset PRECEDING/FOLLOWING requires exactly one ORDER BY column
> --------------------------------------------------------------------------
>
>                 Key: HIVE-24804
>                 URL: https://issues.apache.org/jira/browse/HIVE-24804
>             Project: Hive
>          Issue Type: Bug
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, in Hive, we can run a windowing function with range specification 
> but without an ORDER BY clause:
> {code}
> create table vector_ptf_part_simple_text(p_mfgr string, p_name string, 
> p_retailprice double, rowindex string);
> select p_mfgr, p_name, rowindex,
> count(*) over(partition by p_mfgr range between 1 preceding and current row) 
> as cs1,
> count(*) over(partition by p_mfgr range between 3 preceding and current row) 
> as cs2
> from vector_ptf_part_simple_text;
> {code}
> This is confusing, becuase without an order by clause, the range is out of 
> context, we don't know by which column should we calculate the range.
> Tested on Postgres, it throws an exception:
> {code}
> create table vector_ptf_part_simple_text(p_mfgr varchar(10), p_name 
> varchar(10), p_retailprice integer, rowindex varchar(10));
> select p_mfgr, p_name, rowindex,
> count(*) over(partition by p_mfgr range between 1 preceding and current row) 
> as cs1,
> count(*) over(partition by p_mfgr range between 3 preceding and current row) 
> as cs2
> from vector_ptf_part_simple_text;
> *RANGE with offset PRECEDING/FOLLOWING requires exactly one ORDER BY column*
> {code}
> further references:
> https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts
> {code}
> RANGE: Computes the window frame based on a logical range of rows around the 
> current row, based on the current row’s ORDER BY key value. The provided 
> range value is added or subtracted to the current row's key value to define a 
> starting or ending range boundary for the window frame. In a range-based 
> window frame, there must be exactly one expression in the ORDER BY clause, 
> and the expression must have a numeric type.
> {code}
> https://docs.oracle.com/cd/E17952_01/mysql-8.0-en/window-functions-frames.html
> {code}
> Without ORDER BY: The default frame includes all partition rows (because, 
> without ORDER BY, all partition rows are peers). The default is equivalent to 
> this frame specification:
> RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
> {code}
> I believe this one could only make sense if you don't specify range, 
> otherwise the sql statement reflects a different thing from which is returned 
> by the engine



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24804) RANGE with offset PRECEDING/FOLLOWING requires exactly one ORDER BY column

Reply via email to