[ https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377054#comment-14377054 ]
Hive QA commented on HIVE-10036: -------------------------------- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12706769/HIVE-10036.3.patch {color:green}SUCCESS:{color} +1 7824 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3123/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3123/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3123/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12706769 - PreCommit-HIVE-TRUNK-Build > Writing ORC format big table causes OOM - too many fixed sized stream buffers > ----------------------------------------------------------------------------- > > Key: HIVE-10036 > URL: https://issues.apache.org/jira/browse/HIVE-10036 > Project: Hive > Issue Type: Improvement > Reporter: Selina Zhang > Assignee: Selina Zhang > Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch, > HIVE-10036.3.patch > > > ORC writer keeps multiple out steams for each column. Each output stream is > allocated fixed size ByteBuffer (configurable, default to 256K). For a big > table, the memory cost is unbearable. Specially when HCatalog dynamic > partition involves, several hundreds files may be open and writing at the > same time (same problems for FileSinkOperator). > Global ORC memory manager controls the buffer size, but it only got kicked in > at 5000 rows interval. An enhancement could be done here, but the problem is > reducing the buffer size introduces worse compression and more IOs in read > path. Sacrificing the read performance is always not a good choice. > I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound > to the existing configurable buffer size. Most of the streams does not need > large buffer so the performance got improved significantly. Comparing to > Facebook's hive-dwrf, I monitored 2x performance gain with this fix. > Solving OOM for ORC completely maybe needs lots of effort , but this is > definitely a low hanging fruit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)