[jira] Commented: (HIVE-1694) Accelerate query execution using indexes

Prajakta Kalmegh (JIRA) Tue, 07 Dec 2010 21:22:37 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969186#action_12969186
 ]


Prajakta Kalmegh commented on HIVE-1694:
----------------------------------------

Hi,

I am Prajakta from Persistent Systems Ltd. and am working on the changes that 
John and Namit have suggested above along with Nikhil and Prafulla.
This is a design note about implementation of above review comments.

We're implementing the following changes as a single transformation in 
optimizer:
    (1) Table replacement: involves modification of some internal members of 
TableScanOperator.
    (2) Group by removal: involves removal of some operators (GBY-RS-GBY) where 
GBY is done at both mapper-reducer side; and re-setting of correct parent and 
child operators within the DAG.
    (3) Sub-query insertion: involves creation of new DAG for sub-query and 
attaching it to the original DAG at an appropriate place.
    (4) Projection modification: involves steps similar to (3).
    
We have implemented the above changes as a proof of concept. In this 
implementation, we have invoked this rule as the first transformation in the 
optimizer code so that lineage information is computed later as part of the 
Generator transformation. Another reason that we have applied this as the first 
transformation is that, as of now, the implementation uses the query block (QB) 
information to decide if the transformation can be applied for the input query 
(similar to the canApplyThisRule() method in the original rewrite code). 
Finally, to make the changes (3) and (4), we are modifying the operator DAG. 
However, we are not modifying the original query block (QB). Hence, this leaves 
the QB and the operator DAG out of sync.

The major issues in this implementation approach are:
1. The changes listed above require either modification of operator DAG (in 
case of 2) or creation of new operator DAG(in case of 3 and 4). The 
implementation requires some hacks in the SemanticAnalyzer code if we create a 
new DAG (as in the case of replaceViewReferenceWithDefinition() method which 
uses ParseDriver() to do the same). However, the methods are private (like 
genBodyPlan(...), genSelectPlan(...) etc) making it all the more difficult to 
implement changes (3) and (4) without access to these methods.
2. The creation of new DAG will require creating all associated data structures 
like QB, ASTNode etc as this information is necessary to generate DAG operator 
plan for the sub-queries. This approach would be very similar to what we are 
already doing in rewrite i.e creating new QB and ASTNode. 
3. Since we are creating a new DAG and appending it to the enclosing query DAG, 
we also need to modify the row schema and row resolvers for the operators.

One of the questions that underlies before finalizing the above approach is 
whether the cost-based optimizer, which is to be implemented in the future, 
will work on the query block or on the DAG operator tree. In case it works on 
the operator DAG, then the implementation changes we listed here are bound to 
be done. However, if the cost-based optimizer is to work on the query block, 
then we feel that the initial query rewrite engine code which worked after 
semantic analysis but before plan generation can be made to work with the 
cost-based optimizer. It will be a valuable input from your side if you could 
comment on the cost-based optimizer.
        

> Accelerate query execution using indexes
> ----------------------------------------
>
>                 Key: HIVE-1694
>                 URL: https://issues.apache.org/jira/browse/HIVE-1694
>             Project: Hive
>          Issue Type: New Feature
>          Components: Indexing, Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Nikhil Deshpande
>            Assignee: Nikhil Deshpande
>         Attachments: demo_q1.hql, demo_q2.hql, HIVE-1694_2010-10-28.diff
>
>
> The index building patch (Hive-417) is checked into trunk, this JIRA issue 
> tracks supporting indexes in Hive compiler & execution engine for SELECT 
> queries.
> This is in ref. to John's comment at
> https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
> on creating separate JIRA issue for tracking index usage in optimizer & query 
> execution.
> The aim of this effort is to use indexes to accelerate query execution (for 
> certain class of queries). E.g.
> - Filters and range scans (already being worked on by He Yongqiang as part of 
> HIVE-417?)
> - Joins (index based joins)
> - Group By, Order By and other misc cases
> The proposal is multi-step:
> 1. Building index based operators, compiler and execution engine changes
> 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
> between index scans, full table scans etc.)
> This JIRA initially focuses on the first step. This JIRA is expected to hold 
> the information about index based plans & operator implementations for above 
> mentioned cases. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1694) Accelerate query execution using indexes

Reply via email to