[ https://issues.apache.org/jira/browse/HIVE-21217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16761332#comment-16761332 ]
Vineet Garg edited comment on HIVE-21217 at 2/6/19 8:50 PM: ------------------------------------------------------------ [~szita] Would you mind providing an example query? Is this only valid for queries containing {{RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW}} ? was (Author: vgarg): [~szita] Would you mind providing an example query? Is this only valid for queries containing {{ RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW}} ? > Optimize range calculation for PTF > ---------------------------------- > > Key: HIVE-21217 > URL: https://issues.apache.org/jira/browse/HIVE-21217 > Project: Hive > Issue Type: Improvement > Reporter: Adam Szita > Assignee: Adam Szita > Priority: Major > > During window function execution Hive has to iterate on neighbouring rows of > the current row to find the beginning and end of the proper range (on which > the aggregation will be executed). > When we're using range based windows and have many rows with a certain key > value this can take a lot of time. (e.g. partition size of 80M, in which we > have 2 ranges of 40M rows according to the orderby column: within these 40M > rowsets we're doing 40M x 40M/2 steps.. which is of n^2 time complexity) > I propose to introduce a cache that keeps track of already calculated range > ends so it can be reused in future scans. -- This message was sent by Atlassian JIRA (v7.6.3#76005)