Sheng Zhao created HIVE-28234:
---------------------------------

             Summary: Unexpected results for window function with unbounded 
following
                 Key: HIVE-28234
                 URL: https://issues.apache.org/jira/browse/HIVE-28234
             Project: Hive
          Issue Type: Bug
         Environment: Version information

Hive: 3.1.2

Spark: 3.0.2

Hdfs: 3.2.2

Yarn: 3.2.2
            Reporter: Sheng Zhao


I executed the following HQL, but the result I obtained did not match my 
expectations
{code:java}
with tmp as
    (
     select 2018 as year,3 as stu_len,2000 as num
     union all
     select 2019 as year,3 as stu_len,2000 as num
     union all
     select 2020 as year,4 as stu_len,1000 as num
     union all
     select 2020 as year,3 as stu_len,2000 as num
    )
select
    year,
    stu_len,
    num,
    sum(num) over(partition by year order by stu_len rows between unbounded 
preceding and unbounded following) as sum_1
from tmp; {code}
result:
{code:java}
year stu_len num    sum_1
2018    3    2000    2000
2019    3    2000    2000
2020    3    2000    2000  -- why not 3000
2020    4    1000    3000 {code}
I tried both Spark and Tez engines, and the results were the same.It seems that 
the rows between function is not working.The following is the execution plan 
under the Spark engine.
{code:java}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1STAGE PLANS:
  Stage: Stage-1
    Spark
      Edges:
        Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 1), Map 3 (PARTITION-LEVEL 
SORT, 1), Map 4 (PARTITION-LEVEL SORT, 1), Map 5 (PARTITION-LEVEL SORT, 1)
      DagName: hadoop_20240430034315_69edf282-0a84-4dc8-bb16-523d3cd98f33:30
      Vertices:
        Map 1
            Map Operator Tree:
                TableScan
                  alias: _dummy_table
                  Row Limit Per Split: 1
                  Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE 
Column stats: COMPLETE
                  Select Operator
                    expressions: 2018 (type: int), 3 (type: int), 2000 (type: 
int)
                    outputColumnNames: _col0, _col1, _col2
                    Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: _col0 (type: int), _col1 (type: int)
                      sort order: ++
                      Map-reduce partition columns: _col0 (type: int)
                      Statistics: Num rows: 4 Data size: 48 Basic stats: 
COMPLETE Column stats: COMPLETE
                      value expressions: _col2 (type: int)
        Map 3
            Map Operator Tree:
                TableScan
                  alias: _dummy_table
                  Row Limit Per Split: 1
                  Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE 
Column stats: COMPLETE
                  Select Operator
                    expressions: 2019 (type: int), 3 (type: int), 2000 (type: 
int)
                    outputColumnNames: _col0, _col1, _col2
                    Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: _col0 (type: int), _col1 (type: int)
                      sort order: ++
                      Map-reduce partition columns: _col0 (type: int)
                      Statistics: Num rows: 4 Data size: 48 Basic stats: 
COMPLETE Column stats: COMPLETE
                      value expressions: _col2 (type: int)
        Map 4
            Map Operator Tree:
                TableScan
                  alias: _dummy_table
                  Row Limit Per Split: 1
                  Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE 
Column stats: COMPLETE
                  Select Operator
                    expressions: 2020 (type: int), 4 (type: int), 1000 (type: 
int)
                    outputColumnNames: _col0, _col1, _col2
                    Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: _col0 (type: int), _col1 (type: int)
                      sort order: ++
                      Map-reduce partition columns: _col0 (type: int)
                      Statistics: Num rows: 4 Data size: 48 Basic stats: 
COMPLETE Column stats: COMPLETE
                      value expressions: _col2 (type: int)
        Map 5
            Map Operator Tree:
                TableScan
                  alias: _dummy_table
                  Row Limit Per Split: 1
                  Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE 
Column stats: COMPLETE
                  Select Operator
                    expressions: 2020 (type: int), 3 (type: int), 2000 (type: 
int)
                    outputColumnNames: _col0, _col1, _col2
                    Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: _col0 (type: int), _col1 (type: int)
                      sort order: ++
                      Map-reduce partition columns: _col0 (type: int)
                      Statistics: Num rows: 4 Data size: 48 Basic stats: 
COMPLETE Column stats: COMPLETE
                      value expressions: _col2 (type: int)
        Reducer 2
            Execution mode: vectorized
            Reduce Operator Tree:
              Select Operator
                expressions: KEY.reducesinkkey0 (type: int), KEY.reducesinkkey1 
(type: int), VALUE._col0 (type: int)
                outputColumnNames: _col0, _col1, _col2
                Statistics: Num rows: 4 Data size: 48 Basic stats: COMPLETE 
Column stats: COMPLETE
                PTF Operator
                  Function definitions:
                      Input definition
                        input alias: ptf_0
                        output shape: _col0: int, _col1: int, _col2: int
                        type: WINDOWING
                      Windowing table definition
                        input alias: ptf_1
                        name: windowingtablefunction
                        order by: _col1 ASC NULLS FIRST
                        partition by: _col0
                        raw input shape:
                        window functions:
                            window function definition
                              alias: sum_window_0
                              arguments: _col2
                              name: sum
                              window function: GenericUDAFSumLong
                              window frame: ROWS PRECEDING(MAX)~FOLLOWING(MAX)
                  Statistics: Num rows: 4 Data size: 48 Basic stats: COMPLETE 
Column stats: COMPLETE
                  Select Operator
                    expressions: _col0 (type: int), _col1 (type: int), _col2 
(type: int), sum_window_0 (type: bigint)
                    outputColumnNames: _col0, _col1, _col2, _col3
                    Statistics: Num rows: 4 Data size: 80 Basic stats: COMPLETE 
Column stats: COMPLETE
                    File Output Operator
                      compressed: false
                      Statistics: Num rows: 4 Data size: 80 Basic stats: 
COMPLETE Column stats: COMPLETE
                      table:
                          input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
                          output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                          serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to