[ https://issues.apache.org/jira/browse/HIVE-7989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124393#comment-14124393 ]
Hive QA commented on HIVE-7989: ------------------------------- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12666864/HIVE-7989.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6171 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8 org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/664/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/664/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-664/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12666864 > Optimize Windowing function performance for row frames > ------------------------------------------------------ > > Key: HIVE-7989 > URL: https://issues.apache.org/jira/browse/HIVE-7989 > Project: Hive > Issue Type: Improvement > Components: PTF-Windowing > Affects Versions: 0.13.0 > Reporter: Ankit Kamboj > Attachments: HIVE-7989.patch > > > To find aggregate value for each row, current windowing function > implementation creates a new aggregation buffer for each row, iterates over > all the rows in respective window frame, puts them in buffer and then finds > the aggregated value. This causes bottleneck for partitions with huge number > of rows because this process runs in n-square complexity (n being rows in a > partition) for each partition. So, if there are multiple partitions in a > dataset, each with millions of rows, aggregation for all rows will take days > to finish. > There is scope of optimization for row frames, for following cases: > a) For UNBOUNDED PRECEDING start and bounded end: Instead of iterating on > window frame again for each row, we can slide the end one row at a time and > aggregate, since we know the start is fixed for each row. This will have > running time linear to the size of partition. > b) For bounded start and UNBOUNDED FOLLOWING end: Instead of iterating on > window frame again for each row, we can slide the start one row at a time and > aggregate in reverse, since we know the end is fixed for each row. This will > have running time linear to the size of partition. > Also, In general for both row and value frames, we don't need to iterate over > the range and re-create aggregation buffer if the start as well as end remain > same. Instead, can re-use the previously created aggregation buffer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)