> On Feb. 21, 2018, 3:32 p.m., Sahil Takiar wrote: > > Would be good to know which Hive queries invoke this method. > > Marta Kuczora wrote: > Thanks a lot Sahil for the review. I will check where these methods are > used and come back to you with the answer a bit later.
I checked how these methods are used. The 'add_partition_core' method is used when adding multiple partitions to the table. A simple query, like the following, can trigger it. ALTER TABLE bubu ADD PARTITION (year='2017',month='march') PARTITION (year='2017',month='april') PARTITION (year='2018',month='march') PARTITION (year='2018',month='may') PARTITION (year='2017',month='march', day="3"); It is also used by the DDLTask.createPartitionsInBatches method which is used by the 'msck repair' command. I didn't find much about the 'add_partitions_pspec_core' method. I only found that it is used by the HCatClientHMSImpl.addPartitionSpec method, but I don't know if the HCatClientHMSImpl is used or how it is used. > On Feb. 21, 2018, 3:32 p.m., Sahil Takiar wrote: > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java > > Line 3032 (original), 3065 (patched) > > <https://reviews.apache.org/r/65716/diff/1/?file=1962881#file1962881line3072> > > > > this code looks very similar to the block above. I know its was never > > the intention of this JIRA to do any re-factoring, but how difficult would > > it be to move all this code into a common method so that we don't have to > > fix the bug in two places? not a blocking issue though > > Marta Kuczora wrote: > Yeah, I absolutely agree. This code duplication annoys me as well, just I > wasn't sure that it is acceptable doing the refactoring in the scope of this > Jira. But it is not so difficult, so I will upload a patch where I moved the > common parts to a separate method and we can decide if it is ok like that or > rather do it in a different Jira. > > Marta Kuczora wrote: > I checked how this could be refactored and there are some differences > between the methods which makes it not that straightforward. It is not that > difficult and basically I have the patch, but I would do it in the scope of > an other Jira, so we can discuss some details there. Would this be ok for you > Sahil? Created a follow-up Jira for the refactoring https://issues.apache.org/jira/browse/HIVE-19046 - Marta ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/65716/#review197829 ----------------------------------------------------------- On March 8, 2018, 4:52 p.m., Marta Kuczora wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/65716/ > ----------------------------------------------------------- > > (Updated March 8, 2018, 4:52 p.m.) > > > Review request for hive, Alexander Kolbasov, Peter Vary, and Adam Szita. > > > Bugs: HIVE-18696 > https://issues.apache.org/jira/browse/HIVE-18696 > > > Repository: hive-git > > > Description > ------- > > The idea behind the patch is > > 1) Separate the partition validation from starting the tasks which create the > partition folders. > Instead of doing the checks on the partitions and submit the tasks in one > loop, separated the validation into a different loop. So first iterate > through the partitions, validate the table/db names, and check for > duplicates. Then if all partitions were correct, in the second loop submit > the tasks to create the partition folders. This way if one of the partitions > is incorrect, the exception will be thrown in the first loop, before the > tasks are submitted. So we can be sure that no partition folder will be > created if the list contains an invalid partition. > > 2) Handle the exceptions which occur during the execution of the tasks > differently. > Previously if an exception occured in one task, the remaining tasks were > canceled, and the newly created partition folders were cleaned up in the > finally part. The problem was that it could happen that some tasks were still > not finished with the folder creation when cleaning up the others, so there > could have been leftover folders. After doing some testing it turned out that > this use case cannot be avoided completely when canceling the tasks. > The idea of this patch is to set a flag if an exception is thrown in one of > the tasks. This flag is visible in the tasks and if its value is true, the > partition folders won't be created. Then iterate through the remaining tasks > and wait for them to finish. The tasks which are started before the flag got > set will then finish creating the partition folders. The tasks which are > started after the flag got set, won't create the partition folders, to avoid > unnecessary work. This way it is sure that all tasks are finished, when > entering the finally part where the partition folders are cleaned up. > > > Diffs > ----- > > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java > 662de9a > > standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitions.java > 4d9cb1b > > standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitionsFromPartSpec.java > 1122057 > > > Diff: https://reviews.apache.org/r/65716/diff/3/ > > > Testing > ------- > > Added some new tests cases to the TestAddPartitions and > TestAddPartitionsFromPartSpec tests. > > > Thanks, > > Marta Kuczora > >