[jira] [Updated] (HIVE-3027) The optimizer architecture of Hive is terrible, need code refactoring

Dixin Tang (JIRA) Wed, 10 Dec 2014 01:48:57 -0800

     [ 
https://issues.apache.org/jira/browse/HIVE-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dixin Tang updated HIVE-3027:
-----------------------------
    Labels: architecture optimizer  (was: architecture optimizer ysmart)

> The optimizer architecture of Hive is terrible, need code refactoring
> ---------------------------------------------------------------------
>
>                 Key: HIVE-3027
>                 URL: https://issues.apache.org/jira/browse/HIVE-3027
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.4.0, 0.4.1, 0.5.0, 0.6.0, 0.7.0, 0.7.1, 0.8.0, 0.8.1
>            Reporter: anders
>              Labels: architecture, optimizer
>
> Now I want to add a complete cost-based optimization for hive. but when I 
> begin the work, I found it very difficult to do using current hive 
> optimization framework. The current code of hive, optimizations are all done 
> after generating DAG of operators. It is a awful design and makes me mad. For 
> example, the map-side optimization, it scans the whole operators' DAG and try 
> to find the operators that can be replaced by map-operation and then replace 
> it. How terrible and stupid the code is!!! The terrible code expands to 1000 
> lines, and only implements the map-side optimizations!!! 
> In my opinion, optimization shouldn't be done in a separated step, differnt 
> optimization should be done in appropriate time. For example, join reorder, 
> should be done when we parse the input query, and we can generate Map-Reduce 
> operators or only Map-Operator for each join according to the cost 
> estimation. And, in the process, we can do join and aggreagation merge, and, 
> we shoud push down predicate in proper time and generate proper data 
> sturcture, to insure the cose-estimation module can fetch corresponding 
> predicate of each base table for estimating JOIN cost. How concise and 
> graceful the code will be if we do the optimization this way!!!  But Now, in 
> order to complying with the Optimiser framework of Hive, I have to write lots 
> of ugly code with amazing redundancy, and, the code is very very difficult to 
> debug!!!! Now there is a patch of cost-based JOIN reorder and merge optimizer 
> called YSMART, I glance at it. It use 6000+ code and is difficult to read!! 
> And it's optimization is incompleted.
> The optimizer architecture of Hive is terrible, How can I do now?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-3027) The optimizer architecture of Hive is terrible, need code refactoring

Reply via email to