[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes

[email protected] (JIRA) Mon, 01 Aug 2011 09:44:48 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13073604#comment-13073604
 ]

[email protected] commented on HIVE-1694:
-----------------------------------------------------

bq.  On 2011-07-28 21:40:30, John Sichi wrote:
bq.  > ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java, 
line 61
bq.  > <https://reviews.apache.org/r/1194/diff/1/?file=27052#file27052line61>
bq.  >
bq.  >     Please run ant checkstyle and fix all the formatting discrepancies 
it reports for your new files.
bq.  >

Done! The code is still having checkstyle formatting errors only for places 
where we have used LinkedHashMap, HashMap and ArrayList. The error states 
"Declaring variables, return values or parameters of type 'HashMap' is not 
allowed".

bq.  On 2011-07-28 21:40:30, John Sichi wrote:
bq.  > ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java, 
line 184
bq.  > <https://reviews.apache.org/r/1194/diff/1/?file=27052#file27052line184>
bq.  >
bq.  >     Don't you need to reuse the compact implementation here so that the 
index can be used for WHERE (not just GROUP BY)?
bq.  >

The AggregateIndexHandler now extends from CompactIndexHandler instead of 
TableBasedIndexHandler. We override only analyzeIndexDefinition(...) and 
getIndexBuilderMapRedTask(...) methods from CompactIndexHandler.

bq.  On 2011-07-28 21:40:30, John Sichi wrote:
bq.  > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 700
bq.  > <https://reviews.apache.org/r/1194/diff/1/?file=27054#file27054line700>
bq.  >
bq.  >     This method is redundant now.

Removed. Sorry to have missed that.

bq.  On 2011-07-28 21:40:30, John Sichi wrote:
bq.  > 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java, 
line 252
bq.  > <https://reviews.apache.org/r/1194/diff/1/?file=27056#file27056line252>
bq.  >
bq.  >     I can't think of a case where it would be worse.

Ok.

bq.  On 2011-07-28 21:40:30, John Sichi wrote:
bq.  > 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java,
 line 164
bq.  > <https://reviews.apache.org/r/1194/diff/1/?file=27057#file27057line164>
bq.  >
bq.  >     Actually group-by is now preserved in all cases.

Forgot to change a few comments after fixing the bug. Done!

bq.  On 2011-07-28 21:40:30, John Sichi wrote:
bq.  > 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java, 
line 66
bq.  > <https://reviews.apache.org/r/1194/diff/1/?file=27058#file27058line66>
bq.  >
bq.  >     Please use HTML bullet syntax for Javadoc (otherwise it all gets run 
together into one line when rendered).
bq.  >

Done!

bq.  On 2011-07-28 21:40:30, John Sichi wrote:
bq.  > 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java,
 line 93
bq.  > <https://reviews.apache.org/r/1194/diff/1/?file=27060#file27060line93>
bq.  >
bq.  >     Shouldn't this be BIGINT?
bq.  >     
bq.  >     Also, I think you're supposed to use a TypeInfoFactory for this 
purpose.

Yes. Changed it to bigint. Also changed it in AggregateIndexHandler where I had 
declared the type to be "int".

bq.  On 2011-07-28 21:40:30, John Sichi wrote:
bq.  > ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java, 
line 603
bq.  > <https://reviews.apache.org/r/1194/diff/1/?file=27062#file27062line603>
bq.  >
bq.  >     Not sure why this new constructor is needed...after using it, all 
you do is get the table out of it.

The only other constructor option for tableSpec needs the ASTNode as one of its 
parameters. Since we need to construct a new tableSpec using only the index 
table name, and we do not have a ASTNode for this, I need this constructor. If 
you have any other way in mind, please let me know. That would be helpful.

bq.  On 2011-07-28 21:40:30, John Sichi wrote:
bq.  > ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q, line 27
bq.  > <https://reviews.apache.org/r/1194/diff/1/?file=27063#file27063line27>
bq.  >
bq.  >     This should *not* be using the index, since the index is built on 
count(l_shipdate), and l_shipdate may contain nulls, whereas the query is 
referencing count(1), which is insensitive to nulls.

Yes true. :(
I have now changed the count(1) queries with count(l_shipdate) in 
ql_rewrite_gbtoidx.q file. Also, verified that the count(1) queries are not 
using the index.

bq.  On 2011-07-28 21:40:30, John Sichi wrote:
bq.  > ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q, line 61
bq.  > <https://reviews.apache.org/r/1194/diff/1/?file=27063#file27063line61>
bq.  >
bq.  >     Need additional tests to verify all the cases where the optimization 
should *not* be used:
bq.  >     
bq.  >     * when configuration disables it
bq.  >     * when index partitions do not cover table partitions (I still don't 
see the code for this case)
bq.  >     * ... all the other conditions checked for in the code ...
bq.  >

Added new queries to verify that optimization is not used in case of:
* when configuration disables it
* ... all the other conditions checked for in the code ...

About "* when index partitions do not cover table partitions ", still pending 
(also the code for it). I will upload the new patch once this is done.

- Prajakta

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1194/#review1212
-----------------------------------------------------------

On 2011-07-26 14:44:01, Prajakta Kalmegh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1194/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-07-26 14:44:01)
bq.  
bq.  
bq.  Review request for hive and John Sichi.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This patch has defined a new AggregateIndexHandler which is used to 
optimize the query plan for groupby queries. 
bq.  
bq.  
bq.  This addresses bug HIVE-1694.
bq.      https://issues.apache.org/jira/browse/HIVE-1694
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b46976f 
bq.    ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java 
PRE-CREATION 
bq.    ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff 
bq.    ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 2ca63b3 
bq.    ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a 
bq.    
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
PRE-CREATION 
bq.    
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
 PRE-CREATION 
bq.    
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java 
PRE-CREATION 
bq.    
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java
 PRE-CREATION 
bq.    
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
 PRE-CREATION 
bq.    
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java
 PRE-CREATION 
bq.    ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
77a6dc6 
bq.    ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION 
bq.    ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/1194/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Prajakta
bq.  
bq.

> Accelerate GROUP BY execution using indexes
> -------------------------------------------
>
>                 Key: HIVE-1694
>                 URL: https://issues.apache.org/jira/browse/HIVE-1694
>             Project: Hive
>          Issue Type: New Feature
>          Components: Indexing, Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Nikhil Deshpande
>            Assignee: Prajakta Kalmegh
>         Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
> HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694_2010-10-28.diff, 
> demo_q1.hql, demo_q2.hql
>
>
> The index building patch (Hive-417) is checked into trunk, this JIRA issue 
> tracks supporting indexes in Hive compiler & execution engine for SELECT 
> queries.
> This is in ref. to John's comment at
> https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
> on creating separate JIRA issue for tracking index usage in optimizer & query 
> execution.
> The aim of this effort is to use indexes to accelerate query execution (for 
> certain class of queries). E.g.
> - Filters and range scans (already being worked on by He Yongqiang as part of 
> HIVE-417?)
> - Joins (index based joins)
> - Group By, Order By and other misc cases
> The proposal is multi-step:
> 1. Building index based operators, compiler and execution engine changes
> 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
> between index scans, full table scans etc.)
> This JIRA initially focuses on the first step. This JIRA is expected to hold 
> the information about index based plans & operator implementations for above 
> mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes

Reply via email to