----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7126/ -----------------------------------------------------------
(Updated Sept. 18, 2012, 5:43 p.m.) Review request for hive. Changes ------- bug fix+ 3 test cases Description ------- This optimizer exploits intra-query correlations and merges multiple correlated MapReduce jobs into one jobs. Open a new request since I have been working on hive-git. This addresses bug HIVE-2206. https://issues.apache.org/jira/browse/HIVE-2206 Diffs (updated) ----- common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2693663 ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 283d0b6 ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 8669051 ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 05a399d ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 0c22141 ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 919a140 ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 1a40630 ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 1469325 ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 6bc5fe4 ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java f292131 ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 8bacd3d ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 63e8ff2 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 5f38bf2 ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 16eb125 ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 9a95efd ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 142f040 ql/src/test/queries/clientpositive/correlationoptimizer1.q PRE-CREATION ql/src/test/queries/clientpositive/correlationoptimizer2.q PRE-CREATION ql/src/test/queries/clientpositive/correlationoptimizer3.q PRE-CREATION ql/src/test/results/clientpositive/correlationoptimizer1.q.out PRE-CREATION ql/src/test/results/clientpositive/correlationoptimizer2.q.out PRE-CREATION ql/src/test/results/clientpositive/correlationoptimizer3.q.out PRE-CREATION ql/src/test/results/compiler/plan/groupby1.q.xml 4382252 ql/src/test/results/compiler/plan/groupby2.q.xml eef669c ql/src/test/results/compiler/plan/groupby3.q.xml 9743480 ql/src/test/results/compiler/plan/groupby5.q.xml 8e07860 Diff: https://reviews.apache.org/r/7126/diff/ Testing ------- Cannot test TestHBaseMinimrCliDriver, TestHBaseCliDriver, TestHBaseNegativeCliDriver, testSynchronized in TestEmbeddedHiveMetaStore, testSynchronized in TestRemoteHiveMetaStore, testSynchronized in TestSetUGIOnBothClientServer, testSynchronized in TestSetUGIOnOnlyClient, testSynchronized in TestSetUGIOnOnlyServer, and testNegativeCliDriver_local_mapred_error_cache in TestNegativeCliDriver. This patch should pass all other tests. When the optimizer is enabled (right now, the optimizer is disabled by default), there are several cases failed. 1 is optimized by the optimizer. 1 is not suitable for this correlation optimizer. 2 are due to potential bugs of the trunk. Other failures are parsing cases (xml plans). Those failures are due to my minor changes in SemanticAnalyzer since several redundant operators will be generated for the correlation optimizer. Overall, those failures are not very relevant to the patch. Please see https://issues.apache.org/jira/browse/HIVE-2206?focusedCommentId=13456171&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13456171 for details. Thanks, Yin Huai