[jira] [Commented] (HIVE-16143) Improve msck repair batching

Hive QA (JIRA) Tue, 02 May 2017 23:14:24 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-16143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15994321#comment-15994321
 ]


Hive QA commented on HIVE-16143:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12866074/HIVE-16143.06.patch

{color:green}SUCCESS:{color} +1 due to 9 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10650 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[table_nonprintable]
 (batchId=139)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=155)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5013/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5013/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5013/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12866074 - PreCommit-HIVE-Build

> Improve msck repair batching
> ----------------------------
>
>                 Key: HIVE-16143
>                 URL: https://issues.apache.org/jira/browse/HIVE-16143
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>         Attachments: HIVE-16143.01.patch, HIVE-16143.02.patch, 
> HIVE-16143.03.patch, HIVE-16143.04.patch, HIVE-16143.05.patch, 
> HIVE-16143.06.patch
>
>
> Currently, the {{msck repair table}} command batches the number of partitions 
> created in the metastore using the config {{HIVE_MSCK_REPAIR_BATCH_SIZE}}. 
> Following snippet shows the batching logic. There can be couple of 
> improvements to this batching logic:
> {noformat} 
> int batch_size = conf.getIntVar(ConfVars.HIVE_MSCK_REPAIR_BATCH_SIZE);
>           if (batch_size > 0 && partsNotInMs.size() > batch_size) {
>             int counter = 0;
>             for (CheckResult.PartitionResult part : partsNotInMs) {
>               counter++;
>               
> apd.addPartition(Warehouse.makeSpecFromName(part.getPartitionName()), null);
>               repairOutput.add("Repair: Added partition to metastore " + 
> msckDesc.getTableName()
>                   + ':' + part.getPartitionName());
>               if (counter % batch_size == 0 || counter == 
> partsNotInMs.size()) {
>                 db.createPartitions(apd);
>                 apd = new AddPartitionDesc(table.getDbName(), 
> table.getTableName(), false);
>               }
>             }
>           } else {
>             for (CheckResult.PartitionResult part : partsNotInMs) {
>               
> apd.addPartition(Warehouse.makeSpecFromName(part.getPartitionName()), null);
>               repairOutput.add("Repair: Added partition to metastore " + 
> msckDesc.getTableName()
>                   + ':' + part.getPartitionName());
>             }
>             db.createPartitions(apd);
>           }
>         } catch (Exception e) {
>           LOG.info("Could not bulk-add partitions to metastore; trying one by 
> one", e);
>           repairOutput.clear();
>           msckAddPartitionsOneByOne(db, table, partsNotInMs, repairOutput);
>         }
> {noformat}
> 1. If the batch size is too aggressive the code falls back to adding 
> partitions one by one which is almost always very slow. It is easily possible 
> that users increase the batch size to higher value to make the command run 
> faster but end up with a worse performance because code falls back to adding 
> one by one. Users are then expected to determine the tuned value of batch 
> size which works well for their environment. I think the code could handle 
> this situation better by exponentially decaying the batch size instead of 
> falling back to one by one.
> 2. The other issue with this implementation is if lets say first batch 
> succeeds and the second one fails, the code tries to add all the partitions 
> one by one irrespective of whether some of the were successfully added or 
> not. If we need to fall back to one by one we should atleast remove the ones 
> which we know for sure are already added successfully.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16143) Improve msck repair batching

Reply via email to