[ https://issues.apache.org/jira/browse/HIVE-12077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chinna Rao Lalam updated HIVE-12077: ------------------------------------ Status: Patch Available (was: Open) Batch size can be configured for the msck repair command with the newly introduced propery "hive.msck.repair.batch.size". If the value is greater than zero, it will execute batchwise with the configured batch size. Default value for the property is zero. Zero means it will execute directly Not batchwise. > MSCK Repair table should fix partitions in batches > --------------------------------------------------- > > Key: HIVE-12077 > URL: https://issues.apache.org/jira/browse/HIVE-12077 > Project: Hive > Issue Type: Bug > Components: Hive > Reporter: Ryan P > Assignee: Chinna Rao Lalam > Attachments: HIVE-12077.1.patch, HIVE-12077.2.patch, > HIVE-12077.3.patch > > > If a user attempts to run MSCK REPAIR TABLE on a directory with a large > number of untracked partitions HMS will OOME. I suspect this is because it > attempts to do one large bulk load in an effort to save time. Ultimately this > can lead to a collection so large in size that HMS eventually hits an Out of > Memory Exception. > Instead I suggest that Hive include a configurable batch size that HMS can > use to break up the load. -- This message was sent by Atlassian JIRA (v6.3.4#6332)