[ https://issues.apache.org/jira/browse/HIVE-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dixin Tang updated HIVE-3027: ----------------------------- Labels: architecture optimizer (was: architecture optimizer ysmart) > The optimizer architecture of Hive is terrible, need code refactoring > --------------------------------------------------------------------- > > Key: HIVE-3027 > URL: https://issues.apache.org/jira/browse/HIVE-3027 > Project: Hive > Issue Type: Improvement > Components: Query Processor > Affects Versions: 0.4.0, 0.4.1, 0.5.0, 0.6.0, 0.7.0, 0.7.1, 0.8.0, 0.8.1 > Reporter: anders > Labels: architecture, optimizer > > Now I want to add a complete cost-based optimization for hive. but when I > begin the work, I found it very difficult to do using current hive > optimization framework. The current code of hive, optimizations are all done > after generating DAG of operators. It is a awful design and makes me mad. For > example, the map-side optimization, it scans the whole operators' DAG and try > to find the operators that can be replaced by map-operation and then replace > it. How terrible and stupid the code is!!! The terrible code expands to 1000 > lines, and only implements the map-side optimizations!!! > In my opinion, optimization shouldn't be done in a separated step, differnt > optimization should be done in appropriate time. For example, join reorder, > should be done when we parse the input query, and we can generate Map-Reduce > operators or only Map-Operator for each join according to the cost > estimation. And, in the process, we can do join and aggreagation merge, and, > we shoud push down predicate in proper time and generate proper data > sturcture, to insure the cose-estimation module can fetch corresponding > predicate of each base table for estimating JOIN cost. How concise and > graceful the code will be if we do the optimization this way!!! But Now, in > order to complying with the Optimiser framework of Hive, I have to write lots > of ugly code with amazing redundancy, and, the code is very very difficult to > debug!!!! Now there is a patch of cost-based JOIN reorder and merge optimizer > called YSMART, I glance at it. It use 6000+ code and is difficult to read!! > And it's optimization is incompleted. > The optimizer architecture of Hive is terrible, How can I do now? -- This message was sent by Atlassian JIRA (v6.3.4#6332)