> On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote: > > ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java, > > line 275 > > <https://reviews.apache.org/r/7126/diff/11/?file=221827#file221827line275> > > > > Does this imply CLSRSOperator cannot have more than one child operator > > in any case. If so, can you please add comments stating that along with > > small description why is that? > > Yin Huai wrote: > I thought that, in the original plan, a ReduceSinkOperator can only have > 1 child. Because a CLSRSOperator replaces a ReduceSinkOperator, it also has a > single child. Is my understanding correct? > > Ashutosh Chauhan wrote: > I think you are correct. Since its a terminal operator, RSOperator can > only have 1 child. It will be good to add this as comment there.
done > On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote: > > ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java, line 1451 > > <https://reviews.apache.org/r/7126/diff/11/?file=221831#file221831line1451> > > > > I don't get the reason of introducing rowNumber in Operator. It doesn't > > look its required for optimization to take place correctly. Can you > > elaborate the need of this variable and different associated methods which > > are introduced along with it? If a query plan is optimized by correlation optimizer, multiple operation paths can be merged to a single Map phase. CorrelationCompositeOperator is used to evaluate results from different paths and forward a single row to ReduceSinkOperator. CorrelationCompositeOperator will first buffer rows forwarded to it from different paths. rowNumber is introduced to let CorrelationCompositeOperator know when a row has been processed by all paths. When a new rowNumber (a new row will come) is set through setRowNumber, CorrelationCompositeOperator will evaluate its current buffer and tag which operation paths this current row belongs to. I will add comment to explain it. > On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote: > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java, > > line 134 > > <https://reviews.apache.org/r/7126/diff/11/?file=221835#file221835line134> > > > > Can you add comments clarifying whats the reason for this? done > On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote: > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java, > > line 270 > > <https://reviews.apache.org/r/7126/diff/11/?file=221835#file221835line270> > > > > If I am reading this correctly than, map-side groupbys are converted to > > reduce-side groupbys. I don't see any fundamental reason why correlation > > optimizer cannot work with map-side groupbys. Is this just the limitation > > of current implementation or is there any fundamental reason for it. Please > > add comments whatever the case is. For example, if an MR job for an aggregation operator shares the input table with an MR job for a join operator, with Reduce-side aggregation (RS-GBY pattern), the ReduceSinkOperator for the merged Map phase only needs to emit a single row for both the aggregation operator and the join operator. Thus, I decide to convert map-side groupbys to reduce-side groupbys. > On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote: > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java, line 94 > > <https://reviews.apache.org/r/7126/diff/11/?file=221838#file221838line94> > > > > Can you add comments why it is not compatible with skew-join and > > groupby-skew optimizations? seems skew optimizations will split a single MR job to 2 jobs. I have not carefully thought how to apply the correlation optimization on those plans optimized by skew optimizations. So, I'd suggest we evaluate this issue in a follow-up jira. > On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote: > > ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java, line 393 > > <https://reviews.apache.org/r/7126/diff/11/?file=221845#file221845line393> > > > > Explain annotation should have worked, no ? Having this working will be > > very useful for debugging. Enabled. We need to re-generate some test results (probably all results involving "EXPLAIN"?). > On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote: > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java, > > line 175 > > <https://reviews.apache.org/r/7126/diff/11/?file=221835#file221835line175> > > > > Seems like the logic of auto conversion of join to map-join is same as > > the one used in CommonJoinResolver. If thats the case, instead of > > replicating the logic here, it will be good to refactor that logic out of > > CommonJoinResolver in some util function in some util class and then use > > that function here. That logic will likely change over course of time and > > risk getting diverged if we duplicate instead of reuse. i am working on it. > On Jan. 10, 2013, 2:24 a.m., Ashutosh Chauhan wrote: > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java, > > line 342 > > <https://reviews.apache.org/r/7126/diff/11/?file=221835#file221835line342> > > > > Instead of changing the plan first and than undo the changes on the > > plan later on (in case optimizer can not be applied) I am wondering a > > safer choice could have been to clone the plan and than do your changes on > > the cloned plan. If we find that plan can be optimized, only than change > > the original plan. This will be a bit more safer. Thoughts? Yes, I agree. One question. Is there any convenient way to clone a ParseContext or an entire plan tree? - Yin ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7126/#review15124 ----------------------------------------------------------- On Nov. 19, 2012, 7:51 p.m., Yin Huai wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/7126/ > ----------------------------------------------------------- > > (Updated Nov. 19, 2012, 7:51 p.m.) > > > Review request for hive. > > > Description > ------- > > This optimizer exploits intra-query correlations and merges multiple > correlated MapReduce jobs into one jobs. Open a new request since I have been > working on hive-git. > > > This addresses bug HIVE-2206. > https://issues.apache.org/jira/browse/HIVE-2206 > > > Diffs > ----- > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9fa9525 > conf/hive-default.xml.template f332f3a > > ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java > 7c4c413 > ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 18a9bd2 > ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 46daeb2 > ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 68302f8 > ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 0c22141 > ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 919a140 > ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 1469325 > ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java edde378 > ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java d1555e2 > ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 2bf284d > ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 330aa52 > ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 5a9f064 > ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java b33d616 > ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 9a95efd > ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 6f8bc47 > ql/src/test/queries/clientpositive/correlationoptimizer1.q PRE-CREATION > ql/src/test/queries/clientpositive/correlationoptimizer2.q PRE-CREATION > ql/src/test/queries/clientpositive/correlationoptimizer3.q PRE-CREATION > ql/src/test/queries/clientpositive/correlationoptimizer4.q PRE-CREATION > ql/src/test/queries/clientpositive/correlationoptimizer5.q PRE-CREATION > ql/src/test/results/clientpositive/correlationoptimizer1.q.out PRE-CREATION > ql/src/test/results/clientpositive/correlationoptimizer2.q.out PRE-CREATION > ql/src/test/results/clientpositive/correlationoptimizer3.q.out PRE-CREATION > ql/src/test/results/clientpositive/correlationoptimizer4.q.out PRE-CREATION > ql/src/test/results/clientpositive/correlationoptimizer5.q.out PRE-CREATION > ql/src/test/results/compiler/plan/groupby1.q.xml cd0d6e4 > ql/src/test/results/compiler/plan/groupby2.q.xml 7b07f02 > ql/src/test/results/compiler/plan/groupby3.q.xml a6a1986 > ql/src/test/results/compiler/plan/groupby5.q.xml 25e3583 > > Diff: https://reviews.apache.org/r/7126/diff/ > > > Testing > ------- > > All tests pass. > > > Thanks, > > Yin Huai > >