[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071135#comment-13071135 ]
Prajakta Kalmegh commented on HIVE-1694: ---------------------------------------- Hi John Please find attached the latest patch (HIVE-1694.4.patch): The patch contains: 1. Support for multiple aggregates in index creation using the AggregateIndexHandler. The column names for the index schema are constructed dynamically depending on the aggregates. For 'aggregateFunction(columnName)', the column name in index will be `_aggregateFunction_of_columnName`. For example, for count(l_shipdate), the column name will be `_count_of_l_shipdate)`. For 'count(*)' function, the column name will be `_count_of_all`. 2. Fixed the bug for duplicates in Group-by removal cases. We are not removing group-by in any case now. This has made the logic for query rewrites quite simpler than before. We removed 4 classes (RewriteIndexSubqueryCtx.java, RewriteIndexSubqueryProcFactory.java, RewriteRemoveGroupbyCtx.java, RewriteRemoveGroupbyProcFactory.java) from the previous patch and added two new simpler classes instead (RewriteQueryUsingAggregateIndex.java, RewriteQueryUsingAggregateIndexCtx.java). 3. Added a new query (with 'UNION ALL') in the same ql_rewrite_gbtoidx.q file to demonstrate your requirement in last post. Please note that the query is not a valid real-work use case scenario; but still suffices our purpose to see that one branch rewrite does not corrupt the other branch. 4. Rewrite Optimization now happens after the PredicatePushdown, PartitionPruner and PartitionConditionRemover. This patch does not contain: 1. Optimization for cases with mulitple aggregates in selection 2. Optimization for any other aggregate function apart from count 3. Optimization for queries involving multiple tables (even if they are in a different branch). Since we are not optimizing for case of joins, the constraint also filters out queries which have different tables in union queries. 4. Optimizations for index with multiple columns in its key Here is the review board link for the patch <https://reviews.apache.org/r/1194/>. Please let me know if you have any questions. > Accelerate GROUP BY execution using indexes > ------------------------------------------- > > Key: HIVE-1694 > URL: https://issues.apache.org/jira/browse/HIVE-1694 > Project: Hive > Issue Type: New Feature > Components: Indexing, Query Processor > Affects Versions: 0.7.0 > Reporter: Nikhil Deshpande > Assignee: Prajakta Kalmegh > Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, > HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694_2010-10-28.diff, > demo_q1.hql, demo_q2.hql > > > The index building patch (Hive-417) is checked into trunk, this JIRA issue > tracks supporting indexes in Hive compiler & execution engine for SELECT > queries. > This is in ref. to John's comment at > https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 > on creating separate JIRA issue for tracking index usage in optimizer & query > execution. > The aim of this effort is to use indexes to accelerate query execution (for > certain class of queries). E.g. > - Filters and range scans (already being worked on by He Yongqiang as part of > HIVE-417?) > - Joins (index based joins) > - Group By, Order By and other misc cases > The proposal is multi-step: > 1. Building index based operators, compiler and execution engine changes > 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose > between index scans, full table scans etc.) > This JIRA initially focuses on the first step. This JIRA is expected to hold > the information about index based plans & operator implementations for above > mentioned cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira