-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34757/#review88484
-----------------------------------------------------------


!. First round review. Only at a high level.
2. Patch looks very good and clean.
3. It will be better if we can add some test cases for self union, self-join, 
CWE, and repeated sub-queries. This can be a followup task, though.


ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java 
(line 398)
<https://reviews.apache.org/r/34757/#comment141051>

    This doesn't seem efficient in that the comparison will not terminate early 
when result becomes false. Something like this should offers better performance:
    
    if ( compareObject(desc1.getFilterMapString(), desc2.getFilterMapString()) 
&&
         compareObject(desc1.getKeysString(), desc2.getKeysString()) &&
         (desc1.getPosBigTable() == desc2.getPosBigTable()) && ...) {
         return true;
       } else {
         return false;
       }



ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java
 (line 101)
<https://reviews.apache.org/r/34757/#comment141052>

    There seems an efficiency issue: once we divide the first level works into 
sets, each set containing equivalent works, we only need to compare the next 
level works for each set. There is no point to compare works from the next 
level across sets. For instance, top level has w1, w2, w3, and w4. We get two 
sets: {w1, w2}, {w3, w4}. To further compare and combine, we only need to 
evaluate the children of w1 and w2 together, and the children for w3 and w4 
tother. However, we don't need to evaluate the children for w1 and the children 
for w3.
    
    Am I missing something?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java
 (line 157)
<https://reviews.apache.org/r/34757/#comment141053>

    Could parents be null, in case of top-level works? Same for children.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java
 (line 207)
<https://reviews.apache.org/r/34757/#comment141055>

    I think in SparkWork, there couldn't be two parents connectting to the same 
child. UnionWork wold be such a child, but SparkWork doesn't have UnionWork, if 
I'm not mistaken.
    
    I don't think SparkPlan has a limitation of only link between to trans. If 
there are two links between a parent to a child, the input will be self unioned 
and the result is the input to the child.


- Xuefu Zhang


On June 17, 2015, 8:59 a.m., chengxiang li wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34757/
> -----------------------------------------------------------
> 
> (Updated June 17, 2015, 8:59 a.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-10844
>     https://issues.apache.org/jira/browse/HIVE-10844
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Some Hive queries(like TPCDS Q39) may share the same subquery, which 
> translated into sperate, but equivalent Works in SparkWork, combining these 
> equivalent Works into a single one would help to benifit from following 
> dynamic RDD caching optimization.
> 
> 
> Diffs
> -----
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
> 19aae70 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/JoinCondDesc.java b307b16 
>   ql/src/test/results/clientpositive/spark/auto_join30.q.out 7b5c5e7 
>   ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 8a43d78 
>   ql/src/test/results/clientpositive/spark/groupby10.q.out 9d3cf36 
>   ql/src/test/results/clientpositive/spark/groupby7_map.q.out abd6459 
>   ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 5e69b31 
>   ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 3418b99 
>   
> ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out
>  2cb126d 
>   ql/src/test/results/clientpositive/spark/groupby8.q.out 307395f 
>   ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out ba04a57 
>   ql/src/test/results/clientpositive/spark/insert_into3.q.out 7df5ba8 
>   ql/src/test/results/clientpositive/spark/join22.q.out b1e5b67 
>   ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 8a278ef 
>   ql/src/test/results/clientpositive/spark/union10.q.out 5e8fe38 
>   ql/src/test/results/clientpositive/spark/union11.q.out 20c27c7 
>   ql/src/test/results/clientpositive/spark/union20.q.out 6f0dca6 
>   ql/src/test/results/clientpositive/spark/union28.q.out 98582df 
>   ql/src/test/results/clientpositive/spark/union3.q.out 834b6d4 
>   ql/src/test/results/clientpositive/spark/union30.q.out 3409623 
>   ql/src/test/results/clientpositive/spark/union4.q.out c121ef0 
>   ql/src/test/results/clientpositive/spark/union5.q.out afee988 
>   ql/src/test/results/clientpositive/spark/union_remove_1.q.out ba0e293 
>   ql/src/test/results/clientpositive/spark/union_remove_15.q.out 26cfbab 
>   ql/src/test/results/clientpositive/spark/union_remove_16.q.out 7a7aaf2 
>   ql/src/test/results/clientpositive/spark/union_remove_18.q.out a5e15c5 
>   ql/src/test/results/clientpositive/spark/union_remove_19.q.out ad44400 
>   ql/src/test/results/clientpositive/spark/union_remove_20.q.out 1d67177 
>   ql/src/test/results/clientpositive/spark/union_remove_21.q.out 9f5b070 
>   ql/src/test/results/clientpositive/spark/union_remove_22.q.out 2e01432 
>   ql/src/test/results/clientpositive/spark/union_remove_24.q.out 2659798 
>   ql/src/test/results/clientpositive/spark/union_remove_25.q.out 0a94684 
>   ql/src/test/results/clientpositive/spark/union_remove_4.q.out 6c3d596 
>   ql/src/test/results/clientpositive/spark/union_remove_6.q.out cd36189 
>   ql/src/test/results/clientpositive/spark/union_remove_6_subq.q.out c981ae4 
>   ql/src/test/results/clientpositive/spark/union_remove_7.q.out 084fbd6 
>   ql/src/test/results/clientpositive/spark/union_top_level.q.out dede1ef 
> 
> Diff: https://reviews.apache.org/r/34757/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> chengxiang li
> 
>

Reply via email to