[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13000490#comment-13000490 ]
John Sichi commented on HIVE-1694: ---------------------------------- I'd like to propose a fourth option instead: create a new handler type which stores both the count and the offsets together, so that it can be used for both aggregation and filtering. The index build can still be done with a single GROUP BY, but now with three aggregate expressions in the SELECT list: collect_set (BLOCKOFFSETINSIDEFILE), COUNT(`l_shipdate`), COUNT(*). For a column known to be NOT NULL, just COUNT(*) is good enough, but Hive doesn't currently have that metadata. You could also use IDXPROPERTIES to allow for additional expressions (SUM/MAX/MIN, complex expressions, etc), making these start to look more like materialized aggregate views. In HIVE-1803, they are working on factoring out some of the generic parts of compact index handler for reuse; we should depend on that for the aggregate index handler to avoid duplicating code. > Accelerate GROUP BY execution using indexes > ------------------------------------------- > > Key: HIVE-1694 > URL: https://issues.apache.org/jira/browse/HIVE-1694 > Project: Hive > Issue Type: New Feature > Components: Indexing, Query Processor > Affects Versions: 0.7.0 > Reporter: Nikhil Deshpande > Assignee: Nikhil Deshpande > Attachments: HIVE-1694.1.patch.txt, HIVE-1694_2010-10-28.diff, > demo_q1.hql, demo_q2.hql > > > The index building patch (Hive-417) is checked into trunk, this JIRA issue > tracks supporting indexes in Hive compiler & execution engine for SELECT > queries. > This is in ref. to John's comment at > https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 > on creating separate JIRA issue for tracking index usage in optimizer & query > execution. > The aim of this effort is to use indexes to accelerate query execution (for > certain class of queries). E.g. > - Filters and range scans (already being worked on by He Yongqiang as part of > HIVE-417?) > - Joins (index based joins) > - Group By, Order By and other misc cases > The proposal is multi-step: > 1. Building index based operators, compiler and execution engine changes > 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose > between index scans, full table scans etc.) > This JIRA initially focuses on the first step. This JIRA is expected to hold > the information about index based plans & operator implementations for above > mentioned cases. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira