Re: Review Request 56995: HIVE-15879 : Fix HiveMetaStoreChecker.checkPartitionDirs method

Vihang Karajgaonkar Mon, 27 Feb 2017 10:49:25 -0800


> On Feb. 27, 2017, 6:01 p.m., Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java, 
> > line 452
> > <https://reviews.apache.org/r/56995/diff/2/?file=1647474#file1647474line452>
> >
> >     Can't the variable declartion be `Queue`, same for the other 
> > declarations below.

We can declare it as a Queue, but the implementation of the queue needs to be 
thread-safe since multiple threads are going to operate on the queue at the 
same time. I thought of declaring it as a concurrentQueue would make it more 
clear and understandable without any performance implications. Is there a 
particular advantage you can think of declaring it as queue? This also makes 
sure that we have a compile time type check to ensure that calling methods are 
using concurrent queues.

> On Feb. 27, 2017, 6:01 p.m., Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java, 
> > line 576
> > <https://reviews.apache.org/r/56995/diff/2/?file=1647474#file1647474line576>
> >
> >     Usually better to `throw new HiveException(e)` so that the full 
> > stack-trace is included.

Agreed, fixed.

> On Feb. 27, 2017, 6:01 p.m., Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java, 
> > lines 550-556
> > <https://reviews.apache.org/r/56995/diff/2/?file=1647474#file1647474line550>
> >
> >     Not sure I follow this logic, why are two queues necessary?

Consider you have just one queue called nextLevel. In such a scenario the 
worker threads are removing elements and adding to the same queue. This is the 
classic producer/consumer model operating on a single queue. Initially I tried 
using a single blocking queue, but the terminating condition is non-trivial to 
implement because the worker threads may or may not produce more items for the 
queue. In such a case in order to avoid race conditions at 
while(!nextLevel.isEmpty()) check we will either need some kind of marker 
element added to indicate that worker thread is done, or use wait()/notify() 
constructs to indicate when we have reached the terminating condition. That 
implementation was getting too complex if we wanted to avoid all race 
conditions possible. Given that this code path will be commonly used for msck 
repair command, I thought of preferring the safer yet performant way at the 
small cost of allocating a new queue for each level of the directory tree. Hope 
that explains.

- Vihang

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56995/#review166898
-----------------------------------------------------------

On Feb. 27, 2017, 6:44 p.m., Vihang Karajgaonkar wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56995/
> -----------------------------------------------------------
> 
> (Updated Feb. 27, 2017, 6:44 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Ashutosh Chauhan, and Sergio Pena.
> 
> 
> Bugs: HIVE-15879
>     https://issues.apache.org/jira/browse/HIVE-15879
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-15879 : Fix HiveMetaStoreChecker.checkPartitionDirs method
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java 
> 7c94c95f00492467ba27dedc9ce513e13c85ea61 
>   
> ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveMetaStoreChecker.java 
> 35f52cd522e0e48a333e30966245bec65cc2ec9c 
> 
> Diff: https://reviews.apache.org/r/56995/diff/
> 
> 
> Testing
> -------
> 
> Tested using existing and newly added test cases
> 
> 
> Thanks,
> 
> Vihang Karajgaonkar
> 
>

Re: Review Request 56995: HIVE-15879 : Fix HiveMetaStoreChecker.checkPartitionDirs method

Reply via email to