[
https://issues.apache.org/jira/browse/HIVE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989389#comment-12989389
]
Namit Jain commented on HIVE-1938:
----------------------------------
Currently, Hive does not maintain statistics (distinct values per
table/partition), which is the basis for the
cost model for this discussion.
Do you want to work on collecting such statistis first, and then we can use
them for various plan optimizations ?
I can think of some advantages of the cost model right away (and I am sure
there are many more):
1. Predict "progress" for a query, predict the time taken.
2. Determine the join order.
> Cost Based Query optimization for Joins in Hive
> -----------------------------------------------
>
> Key: HIVE-1938
> URL: https://issues.apache.org/jira/browse/HIVE-1938
> Project: Hive
> Issue Type: Improvement
> Components: Query Processor
> Environment: *nix,java
> Reporter: bharath v
> Assignee: bharath v
>
> Current optimization in Hive is just rule-based and involves applying a set
> of rules on the Plan tree. This depends on hints given by the user (which may
> or may-not be correct) and might result in execution of costlier plans.So
> this jira aims at building a cost-model which can give a good estimate
> various plans before hand (using some meta-data already collected) and we can
> choose the best plan which incurs the least cost.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira