[jira] [Updated] (HIVE-28234) Unexpected results for window function with unbounded following

Sheng Zhao (Jira) Mon, 29 Apr 2024 20:51:07 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-28234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sheng Zhao updated HIVE-28234:
------------------------------
    Environment: 
Hive: 3.1.2

Spark: 3.0.2

Hdfs: 3.2.2

Yarn: 3.2.2

  was:
Version information

Hive: 3.1.2

Spark: 3.0.2

Hdfs: 3.2.2

Yarn: 3.2.2


> Unexpected results for window function with unbounded following
> ---------------------------------------------------------------
>
>                 Key: HIVE-28234
>                 URL: https://issues.apache.org/jira/browse/HIVE-28234
>             Project: Hive
>          Issue Type: Bug
>         Environment: Hive: 3.1.2
> Spark: 3.0.2
> Hdfs: 3.2.2
> Yarn: 3.2.2
>            Reporter: Sheng Zhao
>            Priority: Major
>
> I executed the following HQL, but the result I obtained did not match my 
> expectations
> {code:java}
> with tmp as
>     (
>      select 2018 as year,3 as stu_len,2000 as num
>      union all
>      select 2019 as year,3 as stu_len,2000 as num
>      union all
>      select 2020 as year,4 as stu_len,1000 as num
>      union all
>      select 2020 as year,3 as stu_len,2000 as num
>     )
> select
>     year,
>     stu_len,
>     num,
>     sum(num) over(partition by year order by stu_len rows between unbounded 
> preceding and unbounded following) as sum_1
> from tmp; {code}
> result:
> {code:java}
> year stu_len num    sum_1
> 2018    3    2000    2000
> 2019    3    2000    2000
> 2020    3    2000    2000  -- why not 3000
> 2020    4    1000    3000 {code}
> I tried both Spark and Tez engines, and the results were the same.It seems 
> that the rows between function is not working.The following is the execution 
> plan under the Spark engine.
> {code:java}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1STAGE PLANS:
>   Stage: Stage-1
>     Spark
>       Edges:
>         Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 1), Map 3 (PARTITION-LEVEL 
> SORT, 1), Map 4 (PARTITION-LEVEL SORT, 1), Map 5 (PARTITION-LEVEL SORT, 1)
>       DagName: hadoop_20240430034315_69edf282-0a84-4dc8-bb16-523d3cd98f33:30
>       Vertices:
>         Map 1
>             Map Operator Tree:
>                 TableScan
>                   alias: _dummy_table
>                   Row Limit Per Split: 1
>                   Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE 
> Column stats: COMPLETE
>                   Select Operator
>                     expressions: 2018 (type: int), 3 (type: int), 2000 (type: 
> int)
>                     outputColumnNames: _col0, _col1, _col2
>                     Statistics: Num rows: 1 Data size: 12 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                     Reduce Output Operator
>                       key expressions: _col0 (type: int), _col1 (type: int)
>                       sort order: ++
>                       Map-reduce partition columns: _col0 (type: int)
>                       Statistics: Num rows: 4 Data size: 48 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                       value expressions: _col2 (type: int)
>         Map 3
>             Map Operator Tree:
>                 TableScan
>                   alias: _dummy_table
>                   Row Limit Per Split: 1
>                   Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE 
> Column stats: COMPLETE
>                   Select Operator
>                     expressions: 2019 (type: int), 3 (type: int), 2000 (type: 
> int)
>                     outputColumnNames: _col0, _col1, _col2
>                     Statistics: Num rows: 1 Data size: 12 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                     Reduce Output Operator
>                       key expressions: _col0 (type: int), _col1 (type: int)
>                       sort order: ++
>                       Map-reduce partition columns: _col0 (type: int)
>                       Statistics: Num rows: 4 Data size: 48 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                       value expressions: _col2 (type: int)
>         Map 4
>             Map Operator Tree:
>                 TableScan
>                   alias: _dummy_table
>                   Row Limit Per Split: 1
>                   Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE 
> Column stats: COMPLETE
>                   Select Operator
>                     expressions: 2020 (type: int), 4 (type: int), 1000 (type: 
> int)
>                     outputColumnNames: _col0, _col1, _col2
>                     Statistics: Num rows: 1 Data size: 12 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                     Reduce Output Operator
>                       key expressions: _col0 (type: int), _col1 (type: int)
>                       sort order: ++
>                       Map-reduce partition columns: _col0 (type: int)
>                       Statistics: Num rows: 4 Data size: 48 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                       value expressions: _col2 (type: int)
>         Map 5
>             Map Operator Tree:
>                 TableScan
>                   alias: _dummy_table
>                   Row Limit Per Split: 1
>                   Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE 
> Column stats: COMPLETE
>                   Select Operator
>                     expressions: 2020 (type: int), 3 (type: int), 2000 (type: 
> int)
>                     outputColumnNames: _col0, _col1, _col2
>                     Statistics: Num rows: 1 Data size: 12 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                     Reduce Output Operator
>                       key expressions: _col0 (type: int), _col1 (type: int)
>                       sort order: ++
>                       Map-reduce partition columns: _col0 (type: int)
>                       Statistics: Num rows: 4 Data size: 48 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                       value expressions: _col2 (type: int)
>         Reducer 2
>             Execution mode: vectorized
>             Reduce Operator Tree:
>               Select Operator
>                 expressions: KEY.reducesinkkey0 (type: int), 
> KEY.reducesinkkey1 (type: int), VALUE._col0 (type: int)
>                 outputColumnNames: _col0, _col1, _col2
>                 Statistics: Num rows: 4 Data size: 48 Basic stats: COMPLETE 
> Column stats: COMPLETE
>                 PTF Operator
>                   Function definitions:
>                       Input definition
>                         input alias: ptf_0
>                         output shape: _col0: int, _col1: int, _col2: int
>                         type: WINDOWING
>                       Windowing table definition
>                         input alias: ptf_1
>                         name: windowingtablefunction
>                         order by: _col1 ASC NULLS FIRST
>                         partition by: _col0
>                         raw input shape:
>                         window functions:
>                             window function definition
>                               alias: sum_window_0
>                               arguments: _col2
>                               name: sum
>                               window function: GenericUDAFSumLong
>                               window frame: ROWS PRECEDING(MAX)~FOLLOWING(MAX)
>                   Statistics: Num rows: 4 Data size: 48 Basic stats: COMPLETE 
> Column stats: COMPLETE
>                   Select Operator
>                     expressions: _col0 (type: int), _col1 (type: int), _col2 
> (type: int), sum_window_0 (type: bigint)
>                     outputColumnNames: _col0, _col1, _col2, _col3
>                     Statistics: Num rows: 4 Data size: 80 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                     File Output Operator
>                       compressed: false
>                       Statistics: Num rows: 4 Data size: 80 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                       table:
>                           input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>                           output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                           serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28234) Unexpected results for window function with unbounded following

Reply via email to