[jira] [Commented] (HIVE-7989) Optimize Windowing function performance for row frames

Hive QA (JIRA) Sat, 06 Sep 2014 01:15:07 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-7989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124393#comment-14124393
 ]


Hive QA commented on HIVE-7989:
-------------------------------



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12666864/HIVE-7989.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6171 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/664/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/664/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-664/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12666864

> Optimize Windowing function performance for row frames
> ------------------------------------------------------
>
>                 Key: HIVE-7989
>                 URL: https://issues.apache.org/jira/browse/HIVE-7989
>             Project: Hive
>          Issue Type: Improvement
>          Components: PTF-Windowing
>    Affects Versions: 0.13.0
>            Reporter: Ankit Kamboj
>         Attachments: HIVE-7989.patch
>
>
> To find aggregate value for each row, current windowing function 
> implementation creates a new aggregation buffer for each row, iterates over 
> all the rows in respective window frame, puts them in buffer and then finds 
> the aggregated value. This causes bottleneck for partitions with huge number 
> of rows because this process runs in n-square complexity (n being rows in a 
> partition) for each partition. So, if there are multiple partitions in a 
> dataset, each with millions of rows, aggregation for all rows will take days 
> to finish.
> There is scope of optimization for row frames, for following cases:
> a) For UNBOUNDED PRECEDING start and bounded end: Instead of iterating on 
> window frame again for each row, we can slide the end one row at a time and 
> aggregate, since we know the start is fixed for each row. This will have 
> running time linear to the size of partition.
> b) For bounded start and UNBOUNDED FOLLOWING end: Instead of iterating on 
> window frame again for each row, we can slide the start one row at a time and 
> aggregate in reverse, since we know the end is fixed for each row. This will 
> have running time linear to the size of partition.
> Also, In general for both row and value frames, we don't need to iterate over 
> the range and re-create aggregation buffer if the start as well as end remain 
> same. Instead, can re-use the previously created aggregation buffer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7989) Optimize Windowing function performance for row frames

Reply via email to