> On May 12, 2017, 2:18 p.m., Aihua Xu wrote: > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java > > Lines 3357-3361 (patched) > > <https://reviews.apache.org/r/58936/diff/3/?file=1714847#file1714847line3357> > > > > Vihang and Sahil, > > > > Typically what would cause the batch to fail? Is that because the batch > > could be too large? > > > > Right now, we are hard coding decayingFactor to 2. I have another > > thought: maybe with the retries, we will calculate such decayingFactor so > > the last retry will always process one partition at a time just like what > > we are doing. So given batch size 100 and retries 4, 100, 66, 33, 1? > > > > How do you think?
The batch could fail when the network is flaky or if the processing time of the batch is higher than socket timeout value of metastore client. This could be more common in cloud based datastores like S3. I think what you are proposing is a linearly decaying batchsize which may work fine for smaller batch sizes but may not converge very fast if the batch size is (mis)configured to be much higher or at default value of 0. Eg. consider numPartitions = 10,000 and maxRetries = 10 so batch sizes with your approach will be 10k, 9k, 8k, 7k.. which all may be too high. If we decay exponentially the batches will be 10k, 5k, 2.5k, 1.25k.. which is more likely to succeed. - Vihang ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/58936/#review174792 ----------------------------------------------------------- On May 12, 2017, 9:35 p.m., Vihang Karajgaonkar wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/58936/ > ----------------------------------------------------------- > > (Updated May 12, 2017, 9:35 p.m.) > > > Review request for hive, Aihua Xu, Sergio Pena, and Sahil Takiar. > > > Bugs: HIVE-16143 > https://issues.apache.org/jira/browse/HIVE-16143 > > > Repository: hive-git > > > Description > ------- > > HIVE-16143 : Improve msck repair batching > > > Diffs > ----- > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java > d3ea824c21f2fbf98177cb12a18019416f36a3f9 > common/src/java/org/apache/hive/common/util/RetryUtilities.java > PRE-CREATION > common/src/test/org/apache/hive/common/util/TestRetryUtilities.java > PRE-CREATION > itests/hive-blobstore/src/test/queries/clientpositive/create_like.q > 38f384e4c547d3c93d510b89fccfbc2b8e2cba09 > itests/hive-blobstore/src/test/results/clientpositive/create_like.q.out > 0d362a716291637404a3859fe81068594d82c9e0 > itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java > 2ae1eacb68cef6990ae3f2050af0bed7c8e9843f > ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java > 917e565f28b2c9aaea18033ea3b6b20fa41fcd0a > > ql/src/test/org/apache/hadoop/hive/ql/exec/TestMsckCreatePartitionsInBatches.java > PRE-CREATION > ql/src/test/queries/clientpositive/msck_repair_0.q > 22542331621ca4ce5277c2f46a4264b7540a4d1e > ql/src/test/queries/clientpositive/msck_repair_1.q > ea596cbbd2d4c230f2b5afbe379fc1e8836b6fbd > ql/src/test/queries/clientpositive/msck_repair_2.q > d8338211e970ebac68a7471ee0960ccf2d51cba3 > ql/src/test/queries/clientpositive/msck_repair_3.q > fdefca121a2de361dbd19e7ef34fb220e1733ed2 > ql/src/test/queries/clientpositive/msck_repair_batchsize.q > e56e97ac36a6544f3e20478fdb0e8fa783a857ef > ql/src/test/results/clientpositive/msck_repair_0.q.out > 2e0d9dc423071ebbd9a55606f196cf7752e27b1a > ql/src/test/results/clientpositive/msck_repair_1.q.out > 3f2fe75b194f1248bd5c073dd7db6b71b2ffc2ba > ql/src/test/results/clientpositive/msck_repair_2.q.out > 3f2fe75b194f1248bd5c073dd7db6b71b2ffc2ba > ql/src/test/results/clientpositive/msck_repair_3.q.out > 3f2fe75b194f1248bd5c073dd7db6b71b2ffc2ba > ql/src/test/results/clientpositive/msck_repair_batchsize.q.out > ba99024163a1f2c59d59e9ed7ea276c154c99d24 > ql/src/test/results/clientpositive/repair.q.out > c1834640a35500c521a904a115a718c94546df10 > > > Diff: https://reviews.apache.org/r/58936/diff/4/ > > > Testing > ------- > > > Thanks, > > Vihang Karajgaonkar > >