-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65174/#review205972
-----------------------------------------------------------




common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
Lines 2248 (patched)
<https://reviews.apache.org/r/65174/#comment288920>

    Let's enable it by default: though it will be a pain because of the qfile 
changes, it will help us discover any existing issues.
    
    When we pair it with 1) a pushdown rule, and 2) a redundant topN removal 
rule, this should always result in better plans.



ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java
Lines 37 (patched)
<https://reviews.apache.org/r/65174/#comment288914>

    Can we add a short description for the operator?



ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java
Lines 42 (patched)
<https://reviews.apache.org/r/65174/#comment288915>

    Can we add comments for this properties?



ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java
Lines 181 (patched)
<https://reviews.apache.org/r/65174/#comment288921>

    I think this should be _false_ as skew join optimization is only supported 
in few specific cases, and hence no need to override it? If it should be true, 
add a comment explaining it; otherwise, please remove it.



ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java
Lines 185 (patched)
<https://reviews.apache.org/r/65174/#comment288922>

    Same as above, this is only _true_ currently for FilterOp and SelectOp. If 
this should be _true_ here too, please provide comment. Otherwise, we do not 
need to override.



ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java
Lines 190 (patched)
<https://reviews.apache.org/r/65174/#comment288923>

    Same as above, better remove it and enable it afterwards if needed unless 
we are absolutely sure this is correct.



ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java
Lines 195 (patched)
<https://reviews.apache.org/r/65174/#comment288924>

    Same as above, better remove it and enable it afterwards if needed unless 
we are absolutely sure this is correct.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java
Lines 266 (patched)
<https://reviews.apache.org/r/65174/#comment288930>

    Same as for TopNKeyOperator, we should be cautious.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java
Lines 271 (patched)
<https://reviews.apache.org/r/65174/#comment288931>

    Same as for TopNKeyOperator, we should be cautious.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java
Lines 276 (patched)
<https://reviews.apache.org/r/65174/#comment288932>

    Same as for TopNKeyOperator, we should be cautious.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java
Lines 281 (patched)
<https://reviews.apache.org/r/65174/#comment288933>

    Same as for TopNKeyOperator, we should be cautious.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyProcessor.java
Lines 36 (patched)
<https://reviews.apache.org/r/65174/#comment288934>

    Add general comment to the class.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyProcessor.java
Lines 59 (patched)
<https://reviews.apache.org/r/65174/#comment288941>

    Parent operator can be extracted from 
_reduceSinkOperator.getParentOperators().get(0)_.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyProcessor.java
Lines 60 (patched)
<https://reviews.apache.org/r/65174/#comment288942>

    Can we add condition _if parentOp instanceof TopK, bail out_? Safer in case 
we end up visiting same op more than once.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyProcessor.java
Lines 60 (patched)
<https://reviews.apache.org/r/65174/#comment288943>

    Can we add condition _if parentOp instanceof TopN, bail out_? Safer in case 
we visit an operator more than once.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyProcessor.java
Lines 61 (patched)
<https://reviews.apache.org/r/65174/#comment288939>

    This should be equal to _nd_ and _reduceSinkOperator_, no need to create a 
third copy.



ql/src/java/org/apache/hadoop/hive/ql/plan/TopNKeyDesc.java
Lines 101 (patched)
<https://reviews.apache.org/r/65174/#comment288944>

    We should use _ExprNodeDescUtils.isSame_ for the expression columns (as in 
_ReduceSinkDesc_).


- Jesús Camacho Rodríguez


On July 11, 2018, 12:30 p.m., Teddy Choi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65174/
> -----------------------------------------------------------
> 
> (Updated July 11, 2018, 12:30 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-17896
>     https://issues.apache.org/jira/browse/HIVE-17896
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the 
> group-by operator buffers up all the rows before discarding the 99% of the 
> rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the 
> filtering on the shuffle keys, but it is better to do this before breaking 
> the vectors into rows and losing the isRepeating properties.
> Adding a TopN Key operator in the physical operator tree allows the following 
> to happen.
> GBY->RS(Top=1)
> can become
> TNK(1)->GBY->RS(Top=1)
> So that, the TopNKey can remove rows before they are buffered into the GBY 
> and consume memory.
> Here's the equivalent implementation in Presto
> https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35
> Adding this as a sub-feature of GroupBy prevents further optimizations if the 
> GBY is on keys "a,b,c" and the TopNKey is on just "a".
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6ea68c3500 
>   itests/src/test/resources/testconfiguration.properties 9e012ce2f8 
>   
> ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
>  a002348013 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapperFactory.java 
> 71ee25d9e0 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 7bb6590d5e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyProcessor.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
> 7afbf04797 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java dfd790853b 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TopNKeyDesc.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/VectorTopNKeyDesc.java 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/topnkey.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/vector_topnkey.q PRE-CREATION 
>   ql/src/test/results/clientpositive/llap/topnkey.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/llap/vector_topnkey.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/tez/topnkey.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/tez/vector_topnkey.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/topnkey.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/vector_topnkey.q.out PRE-CREATION 
>   
> serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
>  9393fb853f 
> 
> 
> Diff: https://reviews.apache.org/r/65174/diff/3/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Teddy Choi
> 
>

Reply via email to