[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072550#comment-13072550 ]
jirapos...@reviews.apache.org commented on HIVE-1694: ----------------------------------------------------- ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1194/#review1212 ----------------------------------------------------------- ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java <https://reviews.apache.org/r/1194/#comment2711> Please run ant checkstyle and fix all the formatting discrepancies it reports for your new files. ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java <https://reviews.apache.org/r/1194/#comment2695> Don't you need to reuse the compact implementation here so that the index can be used for WHERE (not just GROUP BY)? ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java <https://reviews.apache.org/r/1194/#comment2696> This method is redundant now. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java <https://reviews.apache.org/r/1194/#comment2698> I can't think of a case where it would be worse. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java <https://reviews.apache.org/r/1194/#comment2699> Actually group-by is now preserved in all cases. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java <https://reviews.apache.org/r/1194/#comment2700> Please use HTML bullet syntax for Javadoc (otherwise it all gets run together into one line when rendered). ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java <https://reviews.apache.org/r/1194/#comment2701> indentation ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java <https://reviews.apache.org/r/1194/#comment2703> Shouldn't this be BIGINT? Also, I think you're supposed to use a TypeInfoFactory for this purpose. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java <https://reviews.apache.org/r/1194/#comment2702> indentation ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java <https://reviews.apache.org/r/1194/#comment2704> typo: Repace ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java <https://reviews.apache.org/r/1194/#comment2707> Not sure why this new constructor is needed...after using it, all you do is get the table out of it. ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q <https://reviews.apache.org/r/1194/#comment2709> This should *not* be using the index, since the index is built on count(l_shipdate), and l_shipdate may contain nulls, whereas the query is referencing count(1), which is insensitive to nulls. ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q <https://reviews.apache.org/r/1194/#comment2710> Need additional tests to verify all the cases where the optimization should *not* be used: * when configuration disables it * when index partitions do not cover table partitions (I still don't see the code for this case) * ... all the other conditions checked for in the code ... - John On 2011-07-26 14:44:01, Prajakta Kalmegh wrote: bq. bq. ----------------------------------------------------------- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1194/ bq. ----------------------------------------------------------- bq. bq. (Updated 2011-07-26 14:44:01) bq. bq. bq. Review request for hive and John Sichi. bq. bq. bq. Summary bq. ------- bq. bq. This patch has defined a new AggregateIndexHandler which is used to optimize the query plan for groupby queries. bq. bq. bq. This addresses bug HIVE-1694. bq. https://issues.apache.org/jira/browse/HIVE-1694 bq. bq. bq. Diffs bq. ----- bq. bq. common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b46976f bq. ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff bq. ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 2ca63b3 bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java PRE-CREATION bq. ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 77a6dc6 bq. ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION bq. ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/1194/diff bq. bq. bq. Testing bq. ------- bq. bq. bq. Thanks, bq. bq. Prajakta bq. bq. > Accelerate GROUP BY execution using indexes > ------------------------------------------- > > Key: HIVE-1694 > URL: https://issues.apache.org/jira/browse/HIVE-1694 > Project: Hive > Issue Type: New Feature > Components: Indexing, Query Processor > Affects Versions: 0.7.0 > Reporter: Nikhil Deshpande > Assignee: Prajakta Kalmegh > Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, > HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694_2010-10-28.diff, > demo_q1.hql, demo_q2.hql > > > The index building patch (Hive-417) is checked into trunk, this JIRA issue > tracks supporting indexes in Hive compiler & execution engine for SELECT > queries. > This is in ref. to John's comment at > https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 > on creating separate JIRA issue for tracking index usage in optimizer & query > execution. > The aim of this effort is to use indexes to accelerate query execution (for > certain class of queries). E.g. > - Filters and range scans (already being worked on by He Yongqiang as part of > HIVE-417?) > - Joins (index based joins) > - Group By, Order By and other misc cases > The proposal is multi-step: > 1. Building index based operators, compiler and execution engine changes > 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose > between index scans, full table scans etc.) > This JIRA initially focuses on the first step. This JIRA is expected to hold > the information about index based plans & operator implementations for above > mentioned cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira