[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes

Prajakta Kalmegh (JIRA) Tue, 26 Jul 2011 07:53:34 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071135#comment-13071135
 ]


Prajakta Kalmegh commented on HIVE-1694:
----------------------------------------

Hi John

Please find attached the latest patch (HIVE-1694.4.patch):
The patch contains:
1. Support for multiple aggregates in index creation using the 
AggregateIndexHandler. The column names for the index schema are constructed 
dynamically depending on the aggregates. 
For 'aggregateFunction(columnName)', the column name in index will be 
`_aggregateFunction_of_columnName`. 
For example, for count(l_shipdate), the column name will be 
`_count_of_l_shipdate)`.
For 'count(*)' function, the column name will be `_count_of_all`.

2. Fixed the bug for duplicates in Group-by removal cases. We are not removing 
group-by in any case now. This has made the logic for query rewrites quite 
simpler than before. 
We removed 4 classes (RewriteIndexSubqueryCtx.java, 
RewriteIndexSubqueryProcFactory.java, RewriteRemoveGroupbyCtx.java, 
RewriteRemoveGroupbyProcFactory.java) from the previous patch  and added two 
new simpler classes instead (RewriteQueryUsingAggregateIndex.java, 
RewriteQueryUsingAggregateIndexCtx.java). 

3. Added a new query (with 'UNION ALL') in the same ql_rewrite_gbtoidx.q file 
to demonstrate your requirement in last post. Please  note that the query is 
not a valid real-work use case scenario; but still suffices our purpose to see 
that one branch rewrite does not corrupt the other branch.

4. Rewrite Optimization now happens after the PredicatePushdown, 
PartitionPruner and PartitionConditionRemover.

This patch does not contain:
1. Optimization for cases with mulitple aggregates in selection
2. Optimization for any other aggregate function apart from count
3. Optimization for queries involving multiple tables (even if they are in a 
different branch). Since we are not optimizing for case of joins, the 
constraint also filters out queries which have different tables in union 
queries.
4. Optimizations for index with multiple columns in its key

Here is the review board link for the patch 
<https://reviews.apache.org/r/1194/>.

Please let me know if you have any questions.


> Accelerate GROUP BY execution using indexes
> -------------------------------------------
>
>                 Key: HIVE-1694
>                 URL: https://issues.apache.org/jira/browse/HIVE-1694
>             Project: Hive
>          Issue Type: New Feature
>          Components: Indexing, Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Nikhil Deshpande
>            Assignee: Prajakta Kalmegh
>         Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
> HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694_2010-10-28.diff, 
> demo_q1.hql, demo_q2.hql
>
>
> The index building patch (Hive-417) is checked into trunk, this JIRA issue 
> tracks supporting indexes in Hive compiler & execution engine for SELECT 
> queries.
> This is in ref. to John's comment at
> https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
> on creating separate JIRA issue for tracking index usage in optimizer & query 
> execution.
> The aim of this effort is to use indexes to accelerate query execution (for 
> certain class of queries). E.g.
> - Filters and range scans (already being worked on by He Yongqiang as part of 
> HIVE-417?)
> - Joins (index based joins)
> - Group By, Order By and other misc cases
> The proposal is multi-step:
> 1. Building index based operators, compiler and execution engine changes
> 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
> between index scans, full table scans etc.)
> This JIRA initially focuses on the first step. This JIRA is expected to hold 
> the information about index based plans & operator implementations for above 
> mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes

Reply via email to