[ 
https://issues.apache.org/jira/browse/HIVE-16024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15893193#comment-15893193
 ] 

Sergio Peña commented on HIVE-16024:
------------------------------------

[~zsombor.klara]

I took a look at the code again, and I think there might be a OOM problem even 
if we fetch all partitions in batches (using PartitionIterable) when the strict 
mode is used. Here's the piece of code:

{noformat}
void checkTable(Table table, PartitionIterable parts,
      boolean findUnknownPartitions, CheckResult result) throws IOException,
      HiveException {
...
Set<Path> partPaths = new HashSet<Path>();
...
for (Partition partition : parts) {
...
     if (!fs.exists(partPath)) {
        PartitionResult pr = new PartitionResult();
        pr.setPartitionName(partition.getName());
        pr.setTableName(partition.getTable().getTableName());
        result.getPartitionsNotOnFs().add(pr);
      }

      for (int i = 0; i < partition.getSpec().size(); i++) {
        partPaths.add(partPath.makeQualified(fs));
        partPath = partPath.getParent();
      }
}
...
{noformat}

My concern is that when running MSCK with million of partitions (fetched in 
batches), and none of the partitions exist on the filesystem, then the above 
code will add each partition name on the CheckResult object and partition 
locations on the partPaths temporary. There's no statistics, but still a 
concern about OOM. Should we refactor that code instead for handling partitions 
in batches on MSCK better?

> MSCK Repair Requires nonstrict hive.mapred.mode
> -----------------------------------------------
>
>                 Key: HIVE-16024
>                 URL: https://issues.apache.org/jira/browse/HIVE-16024
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>    Affects Versions: 2.2.0
>            Reporter: Barna Zsombor Klara
>            Assignee: Barna Zsombor Klara
>         Attachments: HIVE-16024.01.patch, HIVE-16024.02.patch, 
> HIVE-16024.03.patch, HIVE-16024.04.patch
>
>
> MSCK repair fails when hive.mapred.mode is set to strict
> HIVE-13788 modified the way we read up partitions for a table to improve 
> performance. Unfortunately it is using PartitionPruner to load the partitions 
> which in turn is checking hive.mapred.mode.
> The previous code did not check hive.mapred.mode.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to