[ https://issues.apache.org/jira/browse/HIVE-21217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766518#comment-16766518 ]
Hive QA commented on HIVE-21217: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12958422/HIVE-21217.0.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 15796 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[windowing_range_multiorder] (batchId=7) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_windowing_range_multiorder] (batchId=163) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/16039/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16039/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16039/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12958422 - PreCommit-HIVE-Build > Optimize range calculation for PTF > ---------------------------------- > > Key: HIVE-21217 > URL: https://issues.apache.org/jira/browse/HIVE-21217 > Project: Hive > Issue Type: Improvement > Reporter: Adam Szita > Assignee: Adam Szita > Priority: Major > Attachments: HIVE-21217.0.patch > > > During window function execution Hive has to iterate on neighbouring rows of > the current row to find the beginning and end of the proper range (on which > the aggregation will be executed). > When we're using range based windows and have many rows with a certain key > value this can take a lot of time. (e.g. partition size of 80M, in which we > have 2 ranges of 40M rows according to the orderby column: within these 40M > rowsets we're doing 40M x 40M/2 steps.. which is of n^2 time complexity) > I propose to introduce a cache that keeps track of already calculated range > ends so it can be reused in future scans. -- This message was sent by Atlassian JIRA (v7.6.3#76005)