Sheng Zhao created HIVE-28234: --------------------------------- Summary: Unexpected results for window function with unbounded following Key: HIVE-28234 URL: https://issues.apache.org/jira/browse/HIVE-28234 Project: Hive Issue Type: Bug Environment: Version information
Hive: 3.1.2 Spark: 3.0.2 Hdfs: 3.2.2 Yarn: 3.2.2 Reporter: Sheng Zhao I executed the following HQL, but the result I obtained did not match my expectations {code:java} with tmp as ( select 2018 as year,3 as stu_len,2000 as num union all select 2019 as year,3 as stu_len,2000 as num union all select 2020 as year,4 as stu_len,1000 as num union all select 2020 as year,3 as stu_len,2000 as num ) select year, stu_len, num, sum(num) over(partition by year order by stu_len rows between unbounded preceding and unbounded following) as sum_1 from tmp; {code} result: {code:java} year stu_len num sum_1 2018 3 2000 2000 2019 3 2000 2000 2020 3 2000 2000 -- why not 3000 2020 4 1000 3000 {code} I tried both Spark and Tez engines, and the results were the same.It seems that the rows between function is not working.The following is the execution plan under the Spark engine. {code:java} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1STAGE PLANS: Stage: Stage-1 Spark Edges: Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 1), Map 3 (PARTITION-LEVEL SORT, 1), Map 4 (PARTITION-LEVEL SORT, 1), Map 5 (PARTITION-LEVEL SORT, 1) DagName: hadoop_20240430034315_69edf282-0a84-4dc8-bb16-523d3cd98f33:30 Vertices: Map 1 Map Operator Tree: TableScan alias: _dummy_table Row Limit Per Split: 1 Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: 2018 (type: int), 3 (type: int), 2000 (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int), _col1 (type: int) sort order: ++ Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 4 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col2 (type: int) Map 3 Map Operator Tree: TableScan alias: _dummy_table Row Limit Per Split: 1 Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: 2019 (type: int), 3 (type: int), 2000 (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int), _col1 (type: int) sort order: ++ Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 4 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col2 (type: int) Map 4 Map Operator Tree: TableScan alias: _dummy_table Row Limit Per Split: 1 Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: 2020 (type: int), 4 (type: int), 1000 (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int), _col1 (type: int) sort order: ++ Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 4 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col2 (type: int) Map 5 Map Operator Tree: TableScan alias: _dummy_table Row Limit Per Split: 1 Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: 2020 (type: int), 3 (type: int), 2000 (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int), _col1 (type: int) sort order: ++ Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 4 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col2 (type: int) Reducer 2 Execution mode: vectorized Reduce Operator Tree: Select Operator expressions: KEY.reducesinkkey0 (type: int), KEY.reducesinkkey1 (type: int), VALUE._col0 (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 4 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE PTF Operator Function definitions: Input definition input alias: ptf_0 output shape: _col0: int, _col1: int, _col2: int type: WINDOWING Windowing table definition input alias: ptf_1 name: windowingtablefunction order by: _col1 ASC NULLS FIRST partition by: _col0 raw input shape: window functions: window function definition alias: sum_window_0 arguments: _col2 name: sum window function: GenericUDAFSumLong window frame: ROWS PRECEDING(MAX)~FOLLOWING(MAX) Statistics: Num rows: 4 Data size: 48 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col0 (type: int), _col1 (type: int), _col2 (type: int), sum_window_0 (type: bigint) outputColumnNames: _col0, _col1, _col2, _col3 Statistics: Num rows: 4 Data size: 80 Basic stats: COMPLETE Column stats: COMPLETE File Output Operator compressed: false Statistics: Num rows: 4 Data size: 80 Basic stats: COMPLETE Column stats: COMPLETE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)