Re: Review Request: Use sorted nature of compact indexes

namit jain Tue, 01 Nov 2011 11:07:28 -0700

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2605/#review2987
-----------------------------------------------------------




trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveBinarySearchRecordReader.java
<https://reviews.apache.org/r/2605/#comment6699>

    do you need to override these functions ?
    They should be same as HiveRR


- namit


On 2011-10-29 01:39:50, Kevin Wilfong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2605/
> -----------------------------------------------------------
> 
> (Updated 2011-10-29 01:39:50)
> 
> 
> Review request for hive, Yongqiang He, Ning Zhang, and namit jain.
> 
> 
> Summary
> -------
> 
> The CompactIndexHandler determines if the reentrant query it creates is a 
> candidate for using the fact the index is sorted (it has an appropriate 
> number of non-partition conditions, and the query plan is of the form 
> expected).  It sets the input format to HiveSortedInputFormat, and marks the 
> FilterOperator for the non-partition condition.
> 
> The HiveSortedInputFormat is extends HiveInputFormat, so its splits consist 
> of data from a single file, and its record reader is 
> HiveBinarySearchRecordReader.  HiveBinarySearchRecordReader starts by 
> assuming it is performing a binary search.  It sets the appropriate flags in 
> IOContext, which acts as the means of communication between the 
> FilterOperators and the record reader.  The non-partition FilterOperator is 
> responsible for executing a comparison between the value in the row and 
> column of interest and the constant.  It also provides the type of the 
> generic UDF.  It sets this data in the IOContext.  As long as the binary 
> search continues the FilterOperators do not forward rows to the operators 
> below them.  The record reader uses the comparison and the type of the 
> generic UDF to execute a binary search on the underlying RCFile until it 
> finds the block of interest, or determines that if any block is of interest 
> it is the last one.  The search then proceeds linearly from the beginning of 
> the identified block.  If ever in the binary search a problem occurs, like 
> the comparison fails for some reason, a linear search begins from the 
> beginning of the data which has yet to be eliminated.
> 
> Regardless of whether or not a binary search is performed, the record reader 
> attempts to end the linear search as soon as it can based on the comparison 
> and the type of the generic UDF.
> 
> 
> This addresses bug HIVE-2535.
>     https://issues.apache.org/jira/browse/HIVE-2535
> 
> 
> Diffs
> -----
> 
>   trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1183507 
>   trunk/conf/hive-default.xml 1183507 
>   
> trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java
>  1183507 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FilterOperator.java 
> 1183507 
>   
> trunk/ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java
>  1183507 
>   
> trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveBinarySearchRecordReader.java
>  PRE-CREATION 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 1183507 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveRecordReader.java 
> 1183507 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveSortedInputFormat.java 
> PRE-CREATION 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/io/IOContext.java 1183507 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java 1183507 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFileRecordReader.java 
> 1183507 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/FilterDesc.java 1183507 
>   
> trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBaseCompare.java
>  1183507 
>   
> trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/VerifyHiveSortedInputFormatUsedHook.java
>  PRE-CREATION 
>   
> trunk/ql/src/test/org/apache/hadoop/hive/ql/io/TestHiveBinarySearchRecordReader.java
>  PRE-CREATION 
>   trunk/ql/src/test/queries/clientpositive/index_compact_binary_search.q 
> PRE-CREATION 
>   trunk/ql/src/test/results/clientpositive/index_compact_binary_search.q.out 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/2605/diff
> 
> 
> Testing
> -------
> 
> I added a test to verify the functionality of the 
> HiveBinarySearchRecordReader.
> 
> I also added a .q file to test that this returns the correct results when the 
> underlying index is stored in an RCFile and when it is stored in as a text 
> file, with all of the supported operators.
> 
> I ran the .q files to verify they still pass.
> 
> I ran some queries to verify there was a CPU benefit to doing this.  I saw as 
> much as a 45% reduction in the total CPU used by the map reduce job to scan 
> the index, for a large data set. 
> 
> 
> Thanks,
> 
> Kevin
> 
>

Re: Review Request: Use sorted nature of compact indexes

Reply via email to