[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes

John Sichi (JIRA) Mon, 23 May 2011 11:15:33 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038104#comment-13038104
 ]


John Sichi commented on HIVE-1694:
----------------------------------

I collected comments from last week's review meeting below.

* The rewrite needs to check to make sure that the index partitions are 
available (matching the referenced table partitions).  You can take a look at 
the way the Harvey Mudd team handles this, and maybe reuse their code.  This 
implies that predicate pushdown and partition pruning need to happen BEFORE the 
rewrite is applied (currently the rewrite happens before them).

* Isn't it a bug that the GROUP BY is removed in some cases?  The index may 
store multiple rows for the same base table key (since FILENAME is part of the 
index table key), so it seems like a GROUP BY should always be required for 
removing those duplicates.

* Where is _countall used instead of _countkey?  Also, what happens if the 
index is compound (multiple columns in its key)?

* Add a test case for a query in which a table scan is reused in a directed 
acyclic graph, e.g. a UNION where one branch of the union does a rewritable 
GROUP BY on the table and the other branch just reads the table directly.  We 
want to make sure that in this case, the rewrite's replacement of the base 
table in one branch does not corrupt the other branch in any way.

After these have been addressed (along with the existing review board comments) 
and you've had a chance to rebase the patch, we'll do another pass.

Thanks again!


> Accelerate GROUP BY execution using indexes
> -------------------------------------------
>
>                 Key: HIVE-1694
>                 URL: https://issues.apache.org/jira/browse/HIVE-1694
>             Project: Hive
>          Issue Type: New Feature
>          Components: Indexing, Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Nikhil Deshpande
>            Assignee: Prajakta Kalmegh
>         Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
> HIVE-1694.3.patch.txt, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql
>
>
> The index building patch (Hive-417) is checked into trunk, this JIRA issue 
> tracks supporting indexes in Hive compiler & execution engine for SELECT 
> queries.
> This is in ref. to John's comment at
> https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
> on creating separate JIRA issue for tracking index usage in optimizer & query 
> execution.
> The aim of this effort is to use indexes to accelerate query execution (for 
> certain class of queries). E.g.
> - Filters and range scans (already being worked on by He Yongqiang as part of 
> HIVE-417?)
> - Joins (index based joins)
> - Group By, Order By and other misc cases
> The proposal is multi-step:
> 1. Building index based operators, compiler and execution engine changes
> 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
> between index scans, full table scans etc.)
> This JIRA initially focuses on the first step. This JIRA is expected to hold 
> the information about index based plans & operator implementations for above 
> mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes

Reply via email to