[jira] [Commented] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

Hive QA (JIRA) Mon, 23 Mar 2015 18:30:13 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377054#comment-14377054
 ]


Hive QA commented on HIVE-10036:
--------------------------------



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12706769/HIVE-10036.3.patch

{color:green}SUCCESS:{color} +1 7824 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3123/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3123/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3123/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12706769 - PreCommit-HIVE-TRUNK-Build

> Writing ORC format big table causes OOM - too many fixed sized stream buffers
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-10036
>                 URL: https://issues.apache.org/jira/browse/HIVE-10036
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Selina Zhang
>            Assignee: Selina Zhang
>         Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch, 
> HIVE-10036.3.patch
>
>
> ORC writer keeps multiple out steams for each column. Each output stream is 
> allocated fixed size ByteBuffer (configurable, default to 256K). For a big 
> table, the memory cost is unbearable. Specially when HCatalog dynamic 
> partition involves, several hundreds files may be open and writing at the 
> same time (same problems for FileSinkOperator). 
> Global ORC memory manager controls the buffer size, but it only got kicked in 
> at 5000 rows interval. An enhancement could be done here, but the problem is 
> reducing the buffer size introduces worse compression and more IOs in read 
> path. Sacrificing the read performance is always not a good choice. 
> I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound 
> to the existing configurable buffer size. Most of the streams does not need 
> large buffer so the performance got improved significantly. Comparing to 
> Facebook's hive-dwrf, I monitored 2x performance gain with this fix. 
> Solving OOM for ORC completely maybe needs lots of effort , but this is 
> definitely a low hanging fruit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

Reply via email to