[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes

John Sichi (JIRA) Mon, 28 Feb 2011 11:52:03 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13000490#comment-13000490
 ]


John Sichi commented on HIVE-1694:
----------------------------------

I'd like to propose a fourth option instead:  create a new handler type which 
stores both the count and the offsets together, so that it can be used for both 
aggregation and filtering.  The index build can still be done with a single 
GROUP BY, but now with three aggregate expressions in the SELECT list:  
collect_set (BLOCKOFFSETINSIDEFILE), COUNT(`l_shipdate`), COUNT(*).  For a 
column known to be NOT NULL, just COUNT(*) is good enough, but Hive doesn't 
currently have that metadata.  You could also use IDXPROPERTIES to allow for 
additional expressions (SUM/MAX/MIN, complex expressions, etc), making these 
start to look more like materialized aggregate views.

In HIVE-1803, they are working on factoring out some of the generic parts of 
compact index handler for reuse; we should depend on that for the aggregate 
index handler to avoid duplicating code.


> Accelerate GROUP BY execution using indexes
> -------------------------------------------
>
>                 Key: HIVE-1694
>                 URL: https://issues.apache.org/jira/browse/HIVE-1694
>             Project: Hive
>          Issue Type: New Feature
>          Components: Indexing, Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Nikhil Deshpande
>            Assignee: Nikhil Deshpande
>         Attachments: HIVE-1694.1.patch.txt, HIVE-1694_2010-10-28.diff, 
> demo_q1.hql, demo_q2.hql
>
>
> The index building patch (Hive-417) is checked into trunk, this JIRA issue 
> tracks supporting indexes in Hive compiler & execution engine for SELECT 
> queries.
> This is in ref. to John's comment at
> https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
> on creating separate JIRA issue for tracking index usage in optimizer & query 
> execution.
> The aim of this effort is to use indexes to accelerate query execution (for 
> certain class of queries). E.g.
> - Filters and range scans (already being worked on by He Yongqiang as part of 
> HIVE-417?)
> - Joins (index based joins)
> - Group By, Order By and other misc cases
> The proposal is multi-step:
> 1. Building index based operators, compiler and execution engine changes
> 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
> between index scans, full table scans etc.)
> This JIRA initially focuses on the first step. This JIRA is expected to hold 
> the information about index based plans & operator implementations for above 
> mentioned cases. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes

Reply via email to