[ https://issues.apache.org/jira/browse/HIVE-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423451#comment-13423451 ]
Owen O'Malley commented on HIVE-3153: ------------------------------------- The use case that this helps is the one with a relatively (~2000) dynamic partitions per a reducer. In that case it will have an open RCFile.Writer per a dynamic partition, but they aren't being flushed in parallel. By moving the extra buffers and compression codecs so that they are acquired only when they are needed for flush instead of during the whole lifespan of the Writer, I'm able to keep a lot more Writers open at once. > Release codecs and output streams between flushes of RCFile > ----------------------------------------------------------- > > Key: HIVE-3153 > URL: https://issues.apache.org/jira/browse/HIVE-3153 > Project: Hive > Issue Type: Improvement > Components: Compression > Reporter: Owen O'Malley > Assignee: Owen O'Malley > Attachments: hive-3153.patch > > > Currently, RCFile writer holds a compression codec per a file and a > compression output stream per a column. Especially for queries that use > dynamic partitions this quickly consumes a lot of memory. > I'd like flushRecords to get a codec from the pool and create the compression > output stream in flushRecords. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira