[ 
https://issues.apache.org/jira/browse/HIVE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13647306#comment-13647306
 ] 

Phabricator commented on HIVE-4421:
-----------------------------------

omalley has commented on the revision "HIVE-4421 [jira] Improve memory usage by 
ORC dictionaries".

  Ashutosh, I incorporated most of your input. The 5000 rows between memory 
checks is just how often we check the writers against the size of their 
allocation. If there is enough memory, it doesn't result in any IO. I don't 
think there would be enough use to justify making it into a HiveConf variable.

  You asked why I removed the countOutput and the answer is that we didn't have 
immediate plans to use it, the use case for it was relatively rare and it saved 
some memory & complexity.

REVISION DETAIL
  https://reviews.facebook.net/D10545

To: JIRA, ashutoshc, omalley

                
> Improve memory usage by ORC dictionaries
> ----------------------------------------
>
>                 Key: HIVE-4421
>                 URL: https://issues.apache.org/jira/browse/HIVE-4421
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.11.0
>
>         Attachments: HIVE-4421.D10545.1.patch, HIVE-4421.D10545.2.patch, 
> HIVE-4421.D10545.3.patch, HIVE-4421.D10545.4.patch
>
>
> Currently, for tables with many string columns, it is possible to 
> significantly underestimate the memory used by the ORC dictionaries and cause 
> the query to run out of memory in the task. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to