[ https://issues.apache.org/jira/browse/HIVE-25549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karen Coppage resolved HIVE-25549. ---------------------------------- Fix Version/s: 4.0.0 Resolution: Fixed Committed to master branch. Thanks for the feedback [~abstractdog] and for the review [~szita]! > Wrong results for window function with expression in PARTITION BY or ORDER BY > clause > ------------------------------------------------------------------------------------ > > Key: HIVE-25549 > URL: https://issues.apache.org/jira/browse/HIVE-25549 > Project: Hive > Issue Type: Bug > Reporter: Karen Coppage > Assignee: Karen Coppage > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Sometimes the partition in a vectorized PTF needs some sort of > transformation. For these to work the partition expression may need some > transient variables initialized. > Example with row_number: > {code:java} > create table test_rownumber (a string, b string) stored as orc; > insert into test_rownumber values > ('1', 'a'), > ('2', 'b'), > ('3', 'c'), > ('4', 'd'), > ('5', 'e'); > CREATE VIEW `test_rownumber_vue` AS SELECT `test_rownumber`.`a` AS > `a`,CAST(`test_rownumber`.`a` as INT) AS `a_int`, > `test_rownumber`.`b` as `b` from `default`.`test_rownumber`; > set hive.vectorized.execution.enabled=true; > select *, row_number() over(partition by a_int order by b) from > test_rownumber_vue; > {code} > Output is: > {code:java} > +-----------------------+---------------------------+-----------------------+----------------------+ > | test_rownumber_vue.a | test_rownumber_vue.a_int | test_rownumber_vue.b | > row_number_window_0 | > +-----------------------+---------------------------+-----------------------+----------------------+ > | 1 | 1 | a | > 1 | > | 2 | 2 | b | > 2 | > | 3 | 3 | c | > 3 | > | 4 | 4 | d | > 4 | > | 5 | 5 | e | > 5 | > +-----------------------+---------------------------+-----------------------+----------------------+ > {code} > But it should be this, because we restart the row numbering for each > partition: > {code:java} > +-----------------------+---------------------------+-----------------------+----------------------+ > | test_rownumber_vue.a | test_rownumber_vue.a_int | test_rownumber_vue.b | > row_number_window_0 | > +-----------------------+---------------------------+-----------------------+----------------------+ > | 1 | 1 | a | > 1 | > | 2 | 2 | b | > 1 | > | 3 | 3 | c | > 1 | > | 4 | 4 | d | > 1 | > | 5 | 5 | e | > 1 | > +-----------------------+---------------------------+-----------------------+----------------------+ > {code} > Explanation: > CastStringToLong has to be executed on the partition column (a_int). Because > CastStringToLong.integerPrimitiveCategory is not initialized, all output of > CastStringToLong is null - so a_int is interpreted as containing null values > only and partitioning is ignored. -- This message was sent by Atlassian Jira (v8.3.4#803005)