Vihang Karajgaonkar created HIVE-16143:
------------------------------------------
Summary: Improve msck repair batching
Key: HIVE-16143
URL: https://issues.apache.org/jira/browse/HIVE-16143
Project: Hive
Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar
Currently, the {{msck repair table}} command batches the number of partitions
created in the metastore using the config {{HIVE_MSCK_REPAIR_BATCH_SIZE}}.
Following snippet shows the batching logic. There can be couple of improvements
to this batching logic:
{noformat}
int batch_size = conf.getIntVar(ConfVars.HIVE_MSCK_REPAIR_BATCH_SIZE);
if (batch_size > 0 && partsNotInMs.size() > batch_size) {
int counter = 0;
for (CheckResult.PartitionResult part : partsNotInMs) {
counter++;
apd.addPartition(Warehouse.makeSpecFromName(part.getPartitionName()), null);
repairOutput.add("Repair: Added partition to metastore " +
msckDesc.getTableName()
+ ':' + part.getPartitionName());
if (counter % batch_size == 0 || counter == partsNotInMs.size()) {
db.createPartitions(apd);
apd = new AddPartitionDesc(table.getDbName(),
table.getTableName(), false);
}
}
} else {
for (CheckResult.PartitionResult part : partsNotInMs) {
apd.addPartition(Warehouse.makeSpecFromName(part.getPartitionName()), null);
repairOutput.add("Repair: Added partition to metastore " +
msckDesc.getTableName()
+ ':' + part.getPartitionName());
}
db.createPartitions(apd);
}
} catch (Exception e) {
LOG.info("Could not bulk-add partitions to metastore; trying one by
one", e);
repairOutput.clear();
msckAddPartitionsOneByOne(db, table, partsNotInMs, repairOutput);
}
{noformat}
1. If the batch size is too aggressive the code falls back to adding partitions
one by one which is almost always very slow. It is easily possible that users
increase the batch size to higher value to make the command run faster but end
up with a worse performance because code falls back to adding one by one. Users
are then expected to determine the tuned value of batch size which works well
for their environment. I think the code could handle this situation better by
exponentially decaying the batch size instead of falling back to one by one.
2. The other issue with this implementation is if lets say first batch succeeds
and the second one fails, the code tries to add all the partitions one by one
irrespective of whether some of the were successfully added or not. If we need
to fall back to one by one we should atleast remove the ones which we know for
sure are already added successfully.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)