[
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969186#action_12969186
]
Prajakta Kalmegh commented on HIVE-1694:
----------------------------------------
Hi,
I am Prajakta from Persistent Systems Ltd. and am working on the changes that
John and Namit have suggested above along with Nikhil and Prafulla.
This is a design note about implementation of above review comments.
We're implementing the following changes as a single transformation in
optimizer:
(1) Table replacement: involves modification of some internal members of
TableScanOperator.
(2) Group by removal: involves removal of some operators (GBY-RS-GBY) where
GBY is done at both mapper-reducer side; and re-setting of correct parent and
child operators within the DAG.
(3) Sub-query insertion: involves creation of new DAG for sub-query and
attaching it to the original DAG at an appropriate place.
(4) Projection modification: involves steps similar to (3).
We have implemented the above changes as a proof of concept. In this
implementation, we have invoked this rule as the first transformation in the
optimizer code so that lineage information is computed later as part of the
Generator transformation. Another reason that we have applied this as the first
transformation is that, as of now, the implementation uses the query block (QB)
information to decide if the transformation can be applied for the input query
(similar to the canApplyThisRule() method in the original rewrite code).
Finally, to make the changes (3) and (4), we are modifying the operator DAG.
However, we are not modifying the original query block (QB). Hence, this leaves
the QB and the operator DAG out of sync.
The major issues in this implementation approach are:
1. The changes listed above require either modification of operator DAG (in
case of 2) or creation of new operator DAG(in case of 3 and 4). The
implementation requires some hacks in the SemanticAnalyzer code if we create a
new DAG (as in the case of replaceViewReferenceWithDefinition() method which
uses ParseDriver() to do the same). However, the methods are private (like
genBodyPlan(...), genSelectPlan(...) etc) making it all the more difficult to
implement changes (3) and (4) without access to these methods.
2. The creation of new DAG will require creating all associated data structures
like QB, ASTNode etc as this information is necessary to generate DAG operator
plan for the sub-queries. This approach would be very similar to what we are
already doing in rewrite i.e creating new QB and ASTNode.
3. Since we are creating a new DAG and appending it to the enclosing query DAG,
we also need to modify the row schema and row resolvers for the operators.
One of the questions that underlies before finalizing the above approach is
whether the cost-based optimizer, which is to be implemented in the future,
will work on the query block or on the DAG operator tree. In case it works on
the operator DAG, then the implementation changes we listed here are bound to
be done. However, if the cost-based optimizer is to work on the query block,
then we feel that the initial query rewrite engine code which worked after
semantic analysis but before plan generation can be made to work with the
cost-based optimizer. It will be a valuable input from your side if you could
comment on the cost-based optimizer.
> Accelerate query execution using indexes
> ----------------------------------------
>
> Key: HIVE-1694
> URL: https://issues.apache.org/jira/browse/HIVE-1694
> Project: Hive
> Issue Type: New Feature
> Components: Indexing, Query Processor
> Affects Versions: 0.7.0
> Reporter: Nikhil Deshpande
> Assignee: Nikhil Deshpande
> Attachments: demo_q1.hql, demo_q2.hql, HIVE-1694_2010-10-28.diff
>
>
> The index building patch (Hive-417) is checked into trunk, this JIRA issue
> tracks supporting indexes in Hive compiler & execution engine for SELECT
> queries.
> This is in ref. to John's comment at
> https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
> on creating separate JIRA issue for tracking index usage in optimizer & query
> execution.
> The aim of this effort is to use indexes to accelerate query execution (for
> certain class of queries). E.g.
> - Filters and range scans (already being worked on by He Yongqiang as part of
> HIVE-417?)
> - Joins (index based joins)
> - Group By, Order By and other misc cases
> The proposal is multi-step:
> 1. Building index based operators, compiler and execution engine changes
> 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose
> between index scans, full table scans etc.)
> This JIRA initially focuses on the first step. This JIRA is expected to hold
> the information about index based plans & operator implementations for above
> mentioned cases.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.