----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71820/#review218963 -----------------------------------------------------------
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java Line 2386 (original), 2386 (patched) <https://reviews.apache.org/r/71820/#comment306935> Just for reference: Are you planning to enable back the optimization once all other fixes are applied? Or does it have to do with performance concerns for some cases? Even if we disable it here, can we enable it for Perf cli drivers (you can do that by adding the property value in _data/conf/perf-reg/tez/hive-site.xml_? I believe it would be useful to already see how far are we pushing topn in those tpcds queries. ql/src/java/org/apache/hadoop/hive/ql/optimizer/CommonKeyPrefix.java Lines 35 (patched) <https://reviews.apache.org/r/71820/#comment306937> It seems this class can only be created using the static building methods? If that is the case, can you create a private constructor to defeat instantiation? ql/src/java/org/apache/hadoop/hive/ql/optimizer/CommonKeyPrefix.java Lines 37 (patched) <https://reviews.apache.org/r/71820/#comment306936> Can you add comments to each of these static methods explaining shortly how the mapping will be performed? ql/src/java/org/apache/hadoop/hive/ql/optimizer/CommonKeyPrefix.java Lines 93 (patched) <https://reviews.apache.org/r/71820/#comment306938> I believe we should use _ExprNodeDesc.isSame_ here, since equals refers to identify for ExprNodeDesc objects. ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyProcessor.java Line 107 (original), 89 (patched) <https://reviews.apache.org/r/71820/#comment306941> Do we need this since they are already passed when we create the new operator? ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyProcessor.java Line 108 (original), 90 (patched) <https://reviews.apache.org/r/71820/#comment306942> We can just add the child to the child operators instead of creating the list multiple times. ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyProcessor.java Lines 93 (patched) <https://reviews.apache.org/r/71820/#comment306939> _replaceChild_? ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyProcessor.java Lines 95 (patched) <https://reviews.apache.org/r/71820/#comment306940> _replaceParent_? ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyPushdownProcessor.java Lines 18 (patched) <https://reviews.apache.org/r/71820/#comment306948> Let's move all these classes (TopNKeyPushdownProcessor, TopNKeyProcessor, CommonKeyPrefix, etc.) into a new package _org.apache.hadoop.hive.ql.optimizer.topnkey_. Thoughts? ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyPushdownProcessor.java Lines 72 (patched) <https://reviews.apache.org/r/71820/#comment306943> Although it is very simple, can we add this to new method as with the rest of operators? It helps encapsulating so same logic may be used for other simple operators in the future. ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyPushdownProcessor.java Lines 91 (patched) <https://reviews.apache.org/r/71820/#comment306947> Typo ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyPushdownProcessor.java Lines 105 (patched) <https://reviews.apache.org/r/71820/#comment306944> It seems the logic in this method cannot throw a Semantic Exception? ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyPushdownProcessor.java Lines 202 (patched) <https://reviews.apache.org/r/71820/#comment306949> Short comment on this method? _Only LOJ supported_. ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyPushdownProcessor.java Lines 257 (patched) <https://reviews.apache.org/r/71820/#comment306950> Comment? ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyPushdownProcessor.java Lines 258 (patched) <https://reviews.apache.org/r/71820/#comment306951> What happens is the key is the same but the value for top is different? It seems we do not cover that case here but it is important in case you end up with multiple top n operators derived from different RS operators. I believe in that situation we could still remove one of them, but we should keep the smallest value for top n. ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyPushdownProcessor.java Lines 278 (patched) <https://reviews.apache.org/r/71820/#comment306952> Should this method be moved to _CommonKeyPrefix_ too? ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyPushdownProcessor.java Lines 314 (patched) <https://reviews.apache.org/r/71820/#comment306946> Same comment as above: It seems we do not need to create lists to encapsulate other lists here, we can just add the operator to the existing list in the operator? ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java Lines 478 (patched) <https://reviews.apache.org/r/71820/#comment306953> I think we should move this to _runTopNKeyOptimization_, so it can run immediatly after introducing the topN operators in the plan. - Jesús Camacho Rodríguez On Dec. 3, 2019, 9:27 a.m., Krisztian Kasa wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/71820/ > ----------------------------------------------------------- > > (Updated Dec. 3, 2019, 9:27 a.m.) > > > Review request for hive, Jesús Camacho Rodríguez and Zoltan Haindrich. > > > Bugs: HIVE-20150 > https://issues.apache.org/jira/browse/HIVE-20150 > > > Repository: hive-git > > > Description > ------- > > TopNKey pushdown > ================ > 1. Apply patch: > https://issues.apache.org/jira/secure/attachment/12941630/HIVE-20150.11.patch > 2. TopNKey introduction depends only from Reduce Sink with topn property >= 0 > 3. Implement TopNKey operator pushdown through: projection, group by, redeuce > sink, left outer join, other topnkey > 4. Add sort order and null sort order direction check when determining if the > topnkey op can be pushed > 5. Implement handling cases when topnkey op and the parent op has a common > key prefix only. > 6. turn off topnkey optimization by default > > > Diffs > ----- > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 4393a2825e > ql/src/java/org/apache/hadoop/hive/ql/optimizer/CommonKeyPrefix.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyProcessor.java > 0d6cf3c755 > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyPushdownProcessor.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java bf58bd8bb8 > ql/src/test/org/apache/hadoop/hive/ql/optimizer/TestCommonKeyPrefix.java > PRE-CREATION > ql/src/test/queries/clientpositive/topnkey.q 057b6a45ba > ql/src/test/queries/clientpositive/vector_topnkey.q 85c5880cd6 > ql/src/test/results/clientpositive/llap/bucket_groupby.q.out 0c051c926b > ql/src/test/results/clientpositive/llap/check_constraint.q.out 9f2c9a1cd0 > ql/src/test/results/clientpositive/llap/constraints_optimization.q.out > b6d210becf > ql/src/test/results/clientpositive/llap/enforce_constraint_notnull.q.out > 9343e078b7 > ql/src/test/results/clientpositive/llap/explainuser_1.q.out 283a665a20 > ql/src/test/results/clientpositive/llap/explainuser_2.q.out 0219af8833 > ql/src/test/results/clientpositive/llap/external_jdbc_table_perf.q.out > 545cce75a9 > ql/src/test/results/clientpositive/llap/filter_union.q.out 0df77762a0 > ql/src/test/results/clientpositive/llap/limit_pushdown.q.out 3fdd77d802 > ql/src/test/results/clientpositive/llap/limit_pushdown3.q.out efa8c38d7c > ql/src/test/results/clientpositive/llap/llap_decimal64_reader.q.out > ffe5f6fb22 > ql/src/test/results/clientpositive/llap/offset_limit.q.out 23f2de46e5 > ql/src/test/results/clientpositive/llap/offset_limit_ppd_optimizer.q.out > 4ecb7bc46d > ql/src/test/results/clientpositive/llap/orc_struct_type_vectorization.q.out > 0eac389eb7 > > ql/src/test/results/clientpositive/llap/parquet_complex_types_vectorization.q.out > 4362fb6f2e > > ql/src/test/results/clientpositive/llap/parquet_map_type_vectorization.q.out > 24468c9a1b > > ql/src/test/results/clientpositive/llap/parquet_struct_type_vectorization.q.out > 45890a1890 > ql/src/test/results/clientpositive/llap/semijoin_reddedup.q.out 0e9723b8f3 > ql/src/test/results/clientpositive/llap/subquery_ALL.q.out d910c1a79d > ql/src/test/results/clientpositive/llap/subquery_ANY.q.out 91472d631e > ql/src/test/results/clientpositive/llap/topnkey.q.out 1e77587f82 > ql/src/test/results/clientpositive/llap/vector_cast_constant.q.out > cc2dc47280 > ql/src/test/results/clientpositive/llap/vector_char_2.q.out f7e76e5a8b > > ql/src/test/results/clientpositive/llap/vector_groupby_grouping_sets_limit.q.out > 6fd15e7101 > ql/src/test/results/clientpositive/llap/vector_groupby_reduce.q.out > d6325982e3 > ql/src/test/results/clientpositive/llap/vector_mr_diff_schema_alias.q.out > 4d417b9c3d > ql/src/test/results/clientpositive/llap/vector_reduce_groupby_decimal.q.out > 97a211cfc6 > ql/src/test/results/clientpositive/llap/vector_string_concat.q.out > a8019be7aa > ql/src/test/results/clientpositive/llap/vector_topnkey.q.out c140bdfd37 > ql/src/test/results/clientpositive/llap/vectorization_limit.q.out > 7326adf522 > ql/src/test/results/clientpositive/perf/tez/cbo_query14.q.out e9308cd709 > ql/src/test/results/clientpositive/perf/tez/cbo_query77.q.out 02caf99f7d > ql/src/test/results/clientpositive/perf/tez/constraints/cbo_query14.q.out > 43e1b2b5c2 > ql/src/test/results/clientpositive/perf/tez/constraints/cbo_query77.q.out > 2f75361df1 > ql/src/test/results/clientpositive/perf/tez/constraints/query10.q.out > bb3b1b6660 > ql/src/test/results/clientpositive/perf/tez/constraints/query14.q.out > 228b20a8d7 > ql/src/test/results/clientpositive/perf/tez/constraints/query15.q.out > 5268ed3ecf > ql/src/test/results/clientpositive/perf/tez/constraints/query17.q.out > d96222d9e1 > ql/src/test/results/clientpositive/perf/tez/constraints/query25.q.out > adabb76e04 > ql/src/test/results/clientpositive/perf/tez/constraints/query26.q.out > 824bbe6769 > ql/src/test/results/clientpositive/perf/tez/constraints/query27.q.out > abbd02d6c9 > ql/src/test/results/clientpositive/perf/tez/constraints/query29.q.out > c308771dfb > ql/src/test/results/clientpositive/perf/tez/constraints/query35.q.out > 23b3399123 > ql/src/test/results/clientpositive/perf/tez/constraints/query37.q.out > 187ad5c5b5 > ql/src/test/results/clientpositive/perf/tez/constraints/query40.q.out > 070b5cb1f5 > ql/src/test/results/clientpositive/perf/tez/constraints/query43.q.out > b5a6c746d1 > ql/src/test/results/clientpositive/perf/tez/constraints/query45.q.out > 3f5dbf4beb > ql/src/test/results/clientpositive/perf/tez/constraints/query49.q.out > b384aea779 > ql/src/test/results/clientpositive/perf/tez/constraints/query5.q.out > d3f79820f2 > ql/src/test/results/clientpositive/perf/tez/constraints/query50.q.out > 8c9754967f > ql/src/test/results/clientpositive/perf/tez/constraints/query60.q.out > 06a5689938 > ql/src/test/results/clientpositive/perf/tez/constraints/query66.q.out > be612609cf > ql/src/test/results/clientpositive/perf/tez/constraints/query69.q.out > d7469ae5a9 > ql/src/test/results/clientpositive/perf/tez/constraints/query7.q.out > b2eccdbe90 > ql/src/test/results/clientpositive/perf/tez/constraints/query76.q.out > ce4f7cb061 > ql/src/test/results/clientpositive/perf/tez/constraints/query77.q.out > 95ab61bed2 > ql/src/test/results/clientpositive/perf/tez/constraints/query8.q.out > 170bccf406 > ql/src/test/results/clientpositive/perf/tez/constraints/query80.q.out > b18f89373c > ql/src/test/results/clientpositive/perf/tez/constraints/query82.q.out > 8dd6ae9f0f > ql/src/test/results/clientpositive/perf/tez/constraints/query99.q.out > c77a73f4d5 > ql/src/test/results/clientpositive/perf/tez/query10.q.out b346a5c5fb > ql/src/test/results/clientpositive/perf/tez/query14.q.out 069fad2b4a > ql/src/test/results/clientpositive/perf/tez/query15.q.out 3670a718b3 > ql/src/test/results/clientpositive/perf/tez/query17.q.out df70fbc46e > ql/src/test/results/clientpositive/perf/tez/query25.q.out d006795c79 > ql/src/test/results/clientpositive/perf/tez/query26.q.out a1bf3b099b > ql/src/test/results/clientpositive/perf/tez/query27.q.out 6f49de2344 > ql/src/test/results/clientpositive/perf/tez/query29.q.out 5066893829 > ql/src/test/results/clientpositive/perf/tez/query35.q.out 265c51bb72 > ql/src/test/results/clientpositive/perf/tez/query37.q.out 2724fd44dc > ql/src/test/results/clientpositive/perf/tez/query40.q.out 4b65c82e00 > ql/src/test/results/clientpositive/perf/tez/query43.q.out eb19d41926 > ql/src/test/results/clientpositive/perf/tez/query45.q.out 4538a6540d > ql/src/test/results/clientpositive/perf/tez/query49.q.out 9c34eccceb > ql/src/test/results/clientpositive/perf/tez/query5.q.out 38fba27a8e > ql/src/test/results/clientpositive/perf/tez/query50.q.out 6e34831de6 > ql/src/test/results/clientpositive/perf/tez/query60.q.out e77c89ba69 > ql/src/test/results/clientpositive/perf/tez/query66.q.out 7ddcc21f92 > ql/src/test/results/clientpositive/perf/tez/query69.q.out d11b5494e0 > ql/src/test/results/clientpositive/perf/tez/query7.q.out c17ec8aeb9 > ql/src/test/results/clientpositive/perf/tez/query76.q.out c0d60e88cc > ql/src/test/results/clientpositive/perf/tez/query77.q.out ab2b3dc570 > ql/src/test/results/clientpositive/perf/tez/query8.q.out 0af8fdf3df > ql/src/test/results/clientpositive/perf/tez/query80.q.out 47844158fa > ql/src/test/results/clientpositive/perf/tez/query82.q.out c7721acffe > ql/src/test/results/clientpositive/perf/tez/query99.q.out c01122f435 > ql/src/test/results/clientpositive/tez/topnkey.q.out cf2ecf7133 > ql/src/test/results/clientpositive/tez/vector_topnkey.q.out d179013e28 > ql/src/test/results/clientpositive/topnkey.q.out cecbe89b1c > > > Diff: https://reviews.apache.org/r/71820/diff/6/ > > > Testing > ------- > > Run q tests using TestMiniLlapLocalCliDriver > topnkey.q > vector_topnkey.q > > > Thanks, > > Krisztian Kasa > >