[ https://issues.apache.org/jira/browse/HIVE-28234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sheng Zhao updated HIVE-28234: ------------------------------ Environment: Hive: 3.1.2 Spark: 3.0.2 Hdfs: 3.2.2 Yarn: 3.2.2 was: Version information Hive: 3.1.2 Spark: 3.0.2 Hdfs: 3.2.2 Yarn: 3.2.2 > Unexpected results for window function with unbounded following > --------------------------------------------------------------- > > Key: HIVE-28234 > URL: https://issues.apache.org/jira/browse/HIVE-28234 > Project: Hive > Issue Type: Bug > Environment: Hive: 3.1.2 > Spark: 3.0.2 > Hdfs: 3.2.2 > Yarn: 3.2.2 > Reporter: Sheng Zhao > Priority: Major > > I executed the following HQL, but the result I obtained did not match my > expectations > {code:java} > with tmp as > ( > select 2018 as year,3 as stu_len,2000 as num > union all > select 2019 as year,3 as stu_len,2000 as num > union all > select 2020 as year,4 as stu_len,1000 as num > union all > select 2020 as year,3 as stu_len,2000 as num > ) > select > year, > stu_len, > num, > sum(num) over(partition by year order by stu_len rows between unbounded > preceding and unbounded following) as sum_1 > from tmp; {code} > result: > {code:java} > year stu_len num sum_1 > 2018 3 2000 2000 > 2019 3 2000 2000 > 2020 3 2000 2000 -- why not 3000 > 2020 4 1000 3000 {code} > I tried both Spark and Tez engines, and the results were the same.It seems > that the rows between function is not working.The following is the execution > plan under the Spark engine. > {code:java} > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1STAGE PLANS: > Stage: Stage-1 > Spark > Edges: > Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 1), Map 3 (PARTITION-LEVEL > SORT, 1), Map 4 (PARTITION-LEVEL SORT, 1), Map 5 (PARTITION-LEVEL SORT, 1) > DagName: hadoop_20240430034315_69edf282-0a84-4dc8-bb16-523d3cd98f33:30 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: _dummy_table > Row Limit Per Split: 1 > Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: 2018 (type: int), 3 (type: int), 2000 (type: > int) > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 1 Data size: 12 Basic stats: > COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int), _col1 (type: int) > sort order: ++ > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 4 Data size: 48 Basic stats: > COMPLETE Column stats: COMPLETE > value expressions: _col2 (type: int) > Map 3 > Map Operator Tree: > TableScan > alias: _dummy_table > Row Limit Per Split: 1 > Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: 2019 (type: int), 3 (type: int), 2000 (type: > int) > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 1 Data size: 12 Basic stats: > COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int), _col1 (type: int) > sort order: ++ > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 4 Data size: 48 Basic stats: > COMPLETE Column stats: COMPLETE > value expressions: _col2 (type: int) > Map 4 > Map Operator Tree: > TableScan > alias: _dummy_table > Row Limit Per Split: 1 > Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: 2020 (type: int), 4 (type: int), 1000 (type: > int) > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 1 Data size: 12 Basic stats: > COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int), _col1 (type: int) > sort order: ++ > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 4 Data size: 48 Basic stats: > COMPLETE Column stats: COMPLETE > value expressions: _col2 (type: int) > Map 5 > Map Operator Tree: > TableScan > alias: _dummy_table > Row Limit Per Split: 1 > Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: 2020 (type: int), 3 (type: int), 2000 (type: > int) > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 1 Data size: 12 Basic stats: > COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int), _col1 (type: int) > sort order: ++ > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 4 Data size: 48 Basic stats: > COMPLETE Column stats: COMPLETE > value expressions: _col2 (type: int) > Reducer 2 > Execution mode: vectorized > Reduce Operator Tree: > Select Operator > expressions: KEY.reducesinkkey0 (type: int), > KEY.reducesinkkey1 (type: int), VALUE._col0 (type: int) > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 4 Data size: 48 Basic stats: COMPLETE > Column stats: COMPLETE > PTF Operator > Function definitions: > Input definition > input alias: ptf_0 > output shape: _col0: int, _col1: int, _col2: int > type: WINDOWING > Windowing table definition > input alias: ptf_1 > name: windowingtablefunction > order by: _col1 ASC NULLS FIRST > partition by: _col0 > raw input shape: > window functions: > window function definition > alias: sum_window_0 > arguments: _col2 > name: sum > window function: GenericUDAFSumLong > window frame: ROWS PRECEDING(MAX)~FOLLOWING(MAX) > Statistics: Num rows: 4 Data size: 48 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: int), _col1 (type: int), _col2 > (type: int), sum_window_0 (type: bigint) > outputColumnNames: _col0, _col1, _col2, _col3 > Statistics: Num rows: 4 Data size: 80 Basic stats: > COMPLETE Column stats: COMPLETE > File Output Operator > compressed: false > Statistics: Num rows: 4 Data size: 80 Basic stats: > COMPLETE Column stats: COMPLETE > table: > input format: > org.apache.hadoop.mapred.SequenceFileInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)