> On Feb. 27, 2017, 6:01 p.m., Sahil Takiar wrote: > > ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java, > > line 452 > > <https://reviews.apache.org/r/56995/diff/2/?file=1647474#file1647474line452> > > > > Can't the variable declartion be `Queue`, same for the other > > declarations below.
We can declare it as a Queue, but the implementation of the queue needs to be thread-safe since multiple threads are going to operate on the queue at the same time. I thought of declaring it as a concurrentQueue would make it more clear and understandable without any performance implications. Is there a particular advantage you can think of declaring it as queue? This also makes sure that we have a compile time type check to ensure that calling methods are using concurrent queues. > On Feb. 27, 2017, 6:01 p.m., Sahil Takiar wrote: > > ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java, > > line 576 > > <https://reviews.apache.org/r/56995/diff/2/?file=1647474#file1647474line576> > > > > Usually better to `throw new HiveException(e)` so that the full > > stack-trace is included. Agreed, fixed. > On Feb. 27, 2017, 6:01 p.m., Sahil Takiar wrote: > > ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java, > > lines 550-556 > > <https://reviews.apache.org/r/56995/diff/2/?file=1647474#file1647474line550> > > > > Not sure I follow this logic, why are two queues necessary? Consider you have just one queue called nextLevel. In such a scenario the worker threads are removing elements and adding to the same queue. This is the classic producer/consumer model operating on a single queue. Initially I tried using a single blocking queue, but the terminating condition is non-trivial to implement because the worker threads may or may not produce more items for the queue. In such a case in order to avoid race conditions at while(!nextLevel.isEmpty()) check we will either need some kind of marker element added to indicate that worker thread is done, or use wait()/notify() constructs to indicate when we have reached the terminating condition. That implementation was getting too complex if we wanted to avoid all race conditions possible. Given that this code path will be commonly used for msck repair command, I thought of preferring the safer yet performant way at the small cost of allocating a new queue for each level of the directory tree. Hope that explains. - Vihang ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/56995/#review166898 ----------------------------------------------------------- On Feb. 27, 2017, 6:44 p.m., Vihang Karajgaonkar wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/56995/ > ----------------------------------------------------------- > > (Updated Feb. 27, 2017, 6:44 p.m.) > > > Review request for hive, Aihua Xu, Ashutosh Chauhan, and Sergio Pena. > > > Bugs: HIVE-15879 > https://issues.apache.org/jira/browse/HIVE-15879 > > > Repository: hive-git > > > Description > ------- > > HIVE-15879 : Fix HiveMetaStoreChecker.checkPartitionDirs method > > > Diffs > ----- > > ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java > 7c94c95f00492467ba27dedc9ce513e13c85ea61 > > ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveMetaStoreChecker.java > 35f52cd522e0e48a333e30966245bec65cc2ec9c > > Diff: https://reviews.apache.org/r/56995/diff/ > > > Testing > ------- > > Tested using existing and newly added test cases > > > Thanks, > > Vihang Karajgaonkar > >