[ https://issues.apache.org/jira/browse/HIVE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13647306#comment-13647306 ]
Phabricator commented on HIVE-4421: ----------------------------------- omalley has commented on the revision "HIVE-4421 [jira] Improve memory usage by ORC dictionaries". Ashutosh, I incorporated most of your input. The 5000 rows between memory checks is just how often we check the writers against the size of their allocation. If there is enough memory, it doesn't result in any IO. I don't think there would be enough use to justify making it into a HiveConf variable. You asked why I removed the countOutput and the answer is that we didn't have immediate plans to use it, the use case for it was relatively rare and it saved some memory & complexity. REVISION DETAIL https://reviews.facebook.net/D10545 To: JIRA, ashutoshc, omalley > Improve memory usage by ORC dictionaries > ---------------------------------------- > > Key: HIVE-4421 > URL: https://issues.apache.org/jira/browse/HIVE-4421 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers > Reporter: Owen O'Malley > Assignee: Owen O'Malley > Fix For: 0.11.0 > > Attachments: HIVE-4421.D10545.1.patch, HIVE-4421.D10545.2.patch, > HIVE-4421.D10545.3.patch, HIVE-4421.D10545.4.patch > > > Currently, for tables with many string columns, it is possible to > significantly underestimate the memory used by the ORC dictionaries and cause > the query to run out of memory in the task. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira