[jira] Commented: (HIVE-1694) Accelerate query execution using indexes

Prajakta Kalmegh (JIRA) Fri, 31 Dec 2010 18:37:15 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976380#action_12976380
 ]


Prajakta Kalmegh commented on HIVE-1694:
----------------------------------------

Thanks to both of you for your comments on our proposed design. Since the last 
post, we have been working on the code changes as per your comments. The 
progress has been in the following areas:
1) Removed the dependency for our optimizer to be the first one. It can now be 
used as any other optimizer by adding it to "transformations" list.
2) Implemented changes to re-structure the operator DAG plan for group-by 
queries.
3) We have removed the dependency of our optimization to read data from 
QB(query block) as it used to do earlier to check if the optimization can be 
applied before proceeding to apply the re-write. (See canApply() method in the 
original rewrite code.)
4) Regarding issue #3 (from my original post), as per John's suggestion, the 
change for modification of operator row schemas/resolvers are done smoothly 
wherever applicable.
5) We have completed testing the new implementation for simple group-by cases. 
Also, the code to append a sub-query to original DAG is implemented separately 
as of now. This needs to be integrated as part of our optimization.

         The only issue that will be pending post this implementation will be 
regarding John's post on Nov 1st stating "...we store only the distinct block 
offsets, not the distinct row offsets.....". We plan to work on this once the 
current implementation is tested end-to-end. You can expect the update on this 
in a couple of weeks.

> Accelerate query execution using indexes
> ----------------------------------------
>
>                 Key: HIVE-1694
>                 URL: https://issues.apache.org/jira/browse/HIVE-1694
>             Project: Hive
>          Issue Type: New Feature
>          Components: Indexing, Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Nikhil Deshpande
>            Assignee: Nikhil Deshpande
>         Attachments: demo_q1.hql, demo_q2.hql, HIVE-1694_2010-10-28.diff
>
>
> The index building patch (Hive-417) is checked into trunk, this JIRA issue 
> tracks supporting indexes in Hive compiler & execution engine for SELECT 
> queries.
> This is in ref. to John's comment at
> https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
> on creating separate JIRA issue for tracking index usage in optimizer & query 
> execution.
> The aim of this effort is to use indexes to accelerate query execution (for 
> certain class of queries). E.g.
> - Filters and range scans (already being worked on by He Yongqiang as part of 
> HIVE-417?)
> - Joins (index based joins)
> - Group By, Order By and other misc cases
> The proposal is multi-step:
> 1. Building index based operators, compiler and execution engine changes
> 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
> between index scans, full table scans etc.)
> This JIRA initially focuses on the first step. This JIRA is expected to hold 
> the information about index based plans & operator implementations for above 
> mentioned cases. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1694) Accelerate query execution using indexes

Reply via email to