----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/65174/#review205972 -----------------------------------------------------------
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java Lines 2248 (patched) <https://reviews.apache.org/r/65174/#comment288920> Let's enable it by default: though it will be a pain because of the qfile changes, it will help us discover any existing issues. When we pair it with 1) a pushdown rule, and 2) a redundant topN removal rule, this should always result in better plans. ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java Lines 37 (patched) <https://reviews.apache.org/r/65174/#comment288914> Can we add a short description for the operator? ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java Lines 42 (patched) <https://reviews.apache.org/r/65174/#comment288915> Can we add comments for this properties? ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java Lines 181 (patched) <https://reviews.apache.org/r/65174/#comment288921> I think this should be _false_ as skew join optimization is only supported in few specific cases, and hence no need to override it? If it should be true, add a comment explaining it; otherwise, please remove it. ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java Lines 185 (patched) <https://reviews.apache.org/r/65174/#comment288922> Same as above, this is only _true_ currently for FilterOp and SelectOp. If this should be _true_ here too, please provide comment. Otherwise, we do not need to override. ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java Lines 190 (patched) <https://reviews.apache.org/r/65174/#comment288923> Same as above, better remove it and enable it afterwards if needed unless we are absolutely sure this is correct. ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java Lines 195 (patched) <https://reviews.apache.org/r/65174/#comment288924> Same as above, better remove it and enable it afterwards if needed unless we are absolutely sure this is correct. ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java Lines 266 (patched) <https://reviews.apache.org/r/65174/#comment288930> Same as for TopNKeyOperator, we should be cautious. ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java Lines 271 (patched) <https://reviews.apache.org/r/65174/#comment288931> Same as for TopNKeyOperator, we should be cautious. ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java Lines 276 (patched) <https://reviews.apache.org/r/65174/#comment288932> Same as for TopNKeyOperator, we should be cautious. ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java Lines 281 (patched) <https://reviews.apache.org/r/65174/#comment288933> Same as for TopNKeyOperator, we should be cautious. ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyProcessor.java Lines 36 (patched) <https://reviews.apache.org/r/65174/#comment288934> Add general comment to the class. ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyProcessor.java Lines 59 (patched) <https://reviews.apache.org/r/65174/#comment288941> Parent operator can be extracted from _reduceSinkOperator.getParentOperators().get(0)_. ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyProcessor.java Lines 60 (patched) <https://reviews.apache.org/r/65174/#comment288942> Can we add condition _if parentOp instanceof TopK, bail out_? Safer in case we end up visiting same op more than once. ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyProcessor.java Lines 60 (patched) <https://reviews.apache.org/r/65174/#comment288943> Can we add condition _if parentOp instanceof TopN, bail out_? Safer in case we visit an operator more than once. ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyProcessor.java Lines 61 (patched) <https://reviews.apache.org/r/65174/#comment288939> This should be equal to _nd_ and _reduceSinkOperator_, no need to create a third copy. ql/src/java/org/apache/hadoop/hive/ql/plan/TopNKeyDesc.java Lines 101 (patched) <https://reviews.apache.org/r/65174/#comment288944> We should use _ExprNodeDescUtils.isSame_ for the expression columns (as in _ReduceSinkDesc_). - Jesús Camacho Rodríguez On July 11, 2018, 12:30 p.m., Teddy Choi wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/65174/ > ----------------------------------------------------------- > > (Updated July 11, 2018, 12:30 p.m.) > > > Review request for hive. > > > Bugs: HIVE-17896 > https://issues.apache.org/jira/browse/HIVE-17896 > > > Repository: hive-git > > > Description > ------- > > For TPC-DS Query27, the TopN operation is delayed by the group-by - the > group-by operator buffers up all the rows before discarding the 99% of the > rows in the TopN Hash within the ReduceSink Operator. > The RS TopN operator is very restrictive as it only supports doing the > filtering on the shuffle keys, but it is better to do this before breaking > the vectors into rows and losing the isRepeating properties. > Adding a TopN Key operator in the physical operator tree allows the following > to happen. > GBY->RS(Top=1) > can become > TNK(1)->GBY->RS(Top=1) > So that, the TopNKey can remove rows before they are buffered into the GBY > and consume memory. > Here's the equivalent implementation in Presto > https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35 > Adding this as a sub-feature of GroupBy prevents further optimizations if the > GBY is on keys "a,b,c" and the TopNKey is on just "a". > > > Diffs > ----- > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6ea68c3500 > itests/src/test/resources/testconfiguration.properties 9e012ce2f8 > > ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java > a002348013 > ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapperFactory.java > 71ee25d9e0 > ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 7bb6590d5e > ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyProcessor.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java > 7afbf04797 > ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java dfd790853b > ql/src/java/org/apache/hadoop/hive/ql/plan/TopNKeyDesc.java PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/plan/VectorTopNKeyDesc.java > PRE-CREATION > ql/src/test/queries/clientpositive/topnkey.q PRE-CREATION > ql/src/test/queries/clientpositive/vector_topnkey.q PRE-CREATION > ql/src/test/results/clientpositive/llap/topnkey.q.out PRE-CREATION > ql/src/test/results/clientpositive/llap/vector_topnkey.q.out PRE-CREATION > ql/src/test/results/clientpositive/tez/topnkey.q.out PRE-CREATION > ql/src/test/results/clientpositive/tez/vector_topnkey.q.out PRE-CREATION > ql/src/test/results/clientpositive/topnkey.q.out PRE-CREATION > ql/src/test/results/clientpositive/vector_topnkey.q.out PRE-CREATION > > serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java > 9393fb853f > > > Diff: https://reviews.apache.org/r/65174/diff/3/ > > > Testing > ------- > > > Thanks, > > Teddy Choi > >