----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/60753/#review180494 -----------------------------------------------------------
common/pom.xml Lines 234-238 (patched) <https://reviews.apache.org/r/60753/#comment255712> Since this is also a runtime dependency. This set of changes will be needed in ql/pom.xml where we create exec jar which is shipped to all containers. common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java Lines 30-34 (patched) <https://reviews.apache.org/r/60753/#comment255700> As discussed this determination should happen using first byte of serialized string. common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java Lines 28-30 (patched) <https://reviews.apache.org/r/60753/#comment255713> Is it really worth to have a dependency on 3rd party jar for this class. Will using TreeMap be really slow? Javolution library on which we already have a dependcy also provides these fast data structures. If really needed, we shall use those instead. common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java Lines 49 (patched) <https://reviews.apache.org/r/60753/#comment255714> ws common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java Lines 569 (patched) <https://reviews.apache.org/r/60753/#comment255715> can remove common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java Lines 594 (patched) <https://reviews.apache.org/r/60753/#comment255716> ws common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLogUtils.java Lines 2 (patched) <https://reviews.apache.org/r/60753/#comment255717> can remove this. common/src/java/org/apache/hadoop/hive/conf/HiveConf.java Line 1721 (original), 1721 (patched) <https://reviews.apache.org/r/60753/#comment255718> Better name: hive.stats.ndv.algo pom.xml Lines 210 (patched) <https://reviews.apache.org/r/60753/#comment255711> Latest version of this lib is 8.1 We might as well use that one. ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DecimalNumDistinctValueEstimator.java Line 34 (original), 32 (patched) <https://reviews.apache.org/r/60753/#comment255709> whitespace (ws) ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DecimalNumDistinctValueEstimator.java Lines 46 (patched) <https://reviews.apache.org/r/60753/#comment255710> whitespace (ws) - Ashutosh Chauhan On July 14, 2017, 12:08 a.m., pengcheng xiong wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/60753/ > ----------------------------------------------------------- > > (Updated July 14, 2017, 12:08 a.m.) > > > Review request for hive, Ashutosh Chauhan and Prasanth_J. > > > Repository: hive-git > > > Description > ------- > > HIVE-16966 > > > Diffs > ----- > > common/pom.xml 023f084511 > > common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java > PRE-CREATION > common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLConstants.java > PRE-CREATION > common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLDenseRegister.java > PRE-CREATION > common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLRegister.java > PRE-CREATION > > common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java > PRE-CREATION > common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java > PRE-CREATION > common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLogUtils.java > PRE-CREATION > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java c7afe2bc4a > > metastore/src/java/org/apache/hadoop/hive/metastore/NumDistinctValueEstimator.java > 92f9a845e3 > metastore/src/java/org/apache/hadoop/hive/metastore/hbase/StatsCache.java > 18f8afc9ad > > metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/ColumnStatsAggregator.java > 31955b4363 > > metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/ColumnStatsAggregatorFactory.java > daf85692eb > > metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DecimalColumnStatsAggregator.java > 36b2c9c56b > > metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DoubleColumnStatsAggregator.java > a88ef84e5c > > metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/LongColumnStatsAggregator.java > 8ac6561aec > > metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/StringColumnStatsAggregator.java > 2aa4046a46 > > metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMerger.java > 33c7e3e52c > > metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMergerFactory.java > fe890e4e27 > > metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DateColumnStatsMerger.java > 3179b23438 > > metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DecimalColumnStatsMerger.java > c13add9d9c > > metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DoubleColumnStatsMerger.java > fbdba24b0a > > metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/LongColumnStatsMerger.java > ac65590505 > > metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/StringColumnStatsMerger.java > 41587477d3 > > metastore/src/test/org/apache/hadoop/hive/metastore/hbase/TestHBaseAggregateStatsNDVUniformDist.java > 87b1ac870d > pom.xml 32f5fd1493 > > ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java > 0a5cf00c44 > ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java 76f7daeb1b > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DecimalNumDistinctValueEstimator.java > a05906edfa > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DoubleNumDistinctValueEstimator.java > e76fc74dbc > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java > 2ebfcb2360 > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/LongNumDistinctValueEstimator.java > 1c197a028a > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java > fa70f49857 > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/StringNumDistinctValueEstimator.java > 601901c163 > ql/src/test/queries/clientpositive/char_udf1.q 39aa0e0e17 > ql/src/test/queries/clientpositive/compute_stats_date.q 09128f6fb9 > ql/src/test/queries/clientpositive/compute_stats_decimal.q 76e1468ada > ql/src/test/queries/clientpositive/compute_stats_double.q 7a1e0f6295 > ql/src/test/queries/clientpositive/compute_stats_long.q 6a2070f780 > ql/src/test/queries/clientpositive/compute_stats_string.q 0023e7f6bd > ql/src/test/queries/clientpositive/hll.q PRE-CREATION > ql/src/test/queries/clientpositive/reduceSinkDeDuplication_pRS_key_empty.q > 8bbae3914d > ql/src/test/queries/clientpositive/varchar_udf1.q 4d1f884ea7 > ql/src/test/queries/clientpositive/vector_udf1.q 48d3e1ee4d > ql/src/test/results/clientpositive/alter_partition_update_status.q.out > 922822e6d2 > ql/src/test/results/clientpositive/alter_table_column_stats.q.out > 2cc7cbc7b6 > ql/src/test/results/clientpositive/alter_table_update_status.q.out > e26e8cba1c > ql/src/test/results/clientpositive/analyze_tbl_part.q.out ed90b6fc92 > ql/src/test/results/clientpositive/annotate_stats_deep_filters.q.out > 95dd6abaec > ql/src/test/results/clientpositive/annotate_stats_groupby.q.out a8e4854a00 > ql/src/test/results/clientpositive/annotate_stats_join.q.out c1a140b558 > ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out > f7d73c9ddf > ql/src/test/results/clientpositive/autoColumnStats_4.q.out fe3b9e53ef > ql/src/test/results/clientpositive/autoColumnStats_5.q.out e19fb5f504 > ql/src/test/results/clientpositive/autoColumnStats_6.q.out 29b3373e10 > ql/src/test/results/clientpositive/autoColumnStats_7.q.out 9d24bc53ab > ql/src/test/results/clientpositive/autoColumnStats_8.q.out 681d962ed0 > ql/src/test/results/clientpositive/autoColumnStats_9.q.out d26e2c02b7 > ql/src/test/results/clientpositive/auto_join_without_localtask.q.out > 17a912ec13 > ql/src/test/results/clientpositive/avro_decimal.q.out 5a3b72defe > ql/src/test/results/clientpositive/avro_decimal_native.q.out fe77512191 > ql/src/test/results/clientpositive/cbo_rp_annotate_stats_groupby.q.out > f260f034b6 > ql/src/test/results/clientpositive/cbo_rp_join0.q.out b9cf3ceab4 > ql/src/test/results/clientpositive/char_udf1.q.out 07ce108a75 > ql/src/test/results/clientpositive/colstats_all_nulls.q.out 14c5d5b59b > ql/src/test/results/clientpositive/column_pruner_multiple_children.q.out > 96feeed49c > ql/src/test/results/clientpositive/columnstats_partlvl.q.out 07d26e92bb > ql/src/test/results/clientpositive/columnstats_partlvl_dp.q.out 468d2e797b > ql/src/test/results/clientpositive/columnstats_quoting.q.out 52e35385a1 > ql/src/test/results/clientpositive/columnstats_tbllvl.q.out 462d4c1771 > ql/src/test/results/clientpositive/compute_stats_date.q.out c2472377a8 > ql/src/test/results/clientpositive/compute_stats_decimal.q.out e0584c50a8 > ql/src/test/results/clientpositive/compute_stats_double.q.out 5b921735f0 > ql/src/test/results/clientpositive/compute_stats_long.q.out 119d1731cc > ql/src/test/results/clientpositive/compute_stats_string.q.out 8c40490bc0 > ql/src/test/results/clientpositive/confirm_initial_tbl_stats.q.out > faa14ba9c5 > ql/src/test/results/clientpositive/constant_prop_2.q.out 24be5188e2 > ql/src/test/results/clientpositive/correlated_join_keys.q.out ec5d008728 > ql/src/test/results/clientpositive/cross_join_merge.q.out f4956ded22 > ql/src/test/results/clientpositive/decimal_stats.q.out 5d86866e2a > ql/src/test/results/clientpositive/describe_table.q.out 7869494252 > ql/src/test/results/clientpositive/display_colstats_tbllvl.q.out a4b18d7cec > ql/src/test/results/clientpositive/encrypted/encryption_move_tbl.q.out > 06f8408327 > ql/src/test/results/clientpositive/exec_parallel_column_stats.q.out > f256ec11bf > ql/src/test/results/clientpositive/hll.q.out PRE-CREATION > ql/src/test/results/clientpositive/infer_bucket_sort.q.out 1aea388815 > ql/src/test/results/clientpositive/join32.q.out a191284aca > ql/src/test/results/clientpositive/join33.q.out a191284aca > ql/src/test/results/clientpositive/join_parse.q.out 17733acdc3 > ql/src/test/results/clientpositive/llap/autoColumnStats_2.q.out ec209b2ef1 > ql/src/test/results/clientpositive/llap/auto_join1.q.out 6a0a1d5d09 > ql/src/test/results/clientpositive/llap/auto_join21.q.out 25cd67e7f5 > ql/src/test/results/clientpositive/llap/auto_join29.q.out f693ce4512 > ql/src/test/results/clientpositive/llap/auto_join30.q.out 91a80127a9 > ql/src/test/results/clientpositive/llap/cluster.q.out 2fa976b6d5 > ql/src/test/results/clientpositive/llap/column_table_stats.q.out fb04ee8cf9 > ql/src/test/results/clientpositive/llap/column_table_stats_orc.q.out > d55cf30331 > ql/src/test/results/clientpositive/llap/columnstats_part_coltype.q.out > dc50fb7fc1 > ql/src/test/results/clientpositive/llap/correlationoptimizer1.q.out > fab5c9c029 > ql/src/test/results/clientpositive/llap/correlationoptimizer2.q.out > ee645410b9 > ql/src/test/results/clientpositive/llap/correlationoptimizer3.q.out > f1af97d98d > ql/src/test/results/clientpositive/llap/correlationoptimizer6.q.out > 9b71d3ec82 > ql/src/test/results/clientpositive/llap/count_dist_rewrite.q.out 6d411365b8 > ql/src/test/results/clientpositive/llap/cross_join.q.out ae3f9bf6f5 > ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out > 7fd7e20e15 > ql/src/test/results/clientpositive/llap/dynamic_semijoin_user_level.q.out > c4b18b7118 > ql/src/test/results/clientpositive/llap/dynpart_sort_optimization2.q.out > d4811d64d7 > ql/src/test/results/clientpositive/llap/explainuser_1.q.out 87b8cae445 > ql/src/test/results/clientpositive/llap/explainuser_2.q.out f8a6526c67 > ql/src/test/results/clientpositive/llap/explainuser_4.q.out 95d13216e7 > > ql/src/test/results/clientpositive/llap/extrapolate_part_stats_partial_ndv.q.out > d97223c9d0 > ql/src/test/results/clientpositive/llap/filter_union.q.out c4af31715c > ql/src/test/results/clientpositive/llap/groupby1.q.out 0eecbb6f4e > ql/src/test/results/clientpositive/llap/groupby2.q.out 29b85d1f44 > ql/src/test/results/clientpositive/llap/groupby_resolution.q.out f2a6ab05ac > ql/src/test/results/clientpositive/llap/having.q.out 267254c0de > ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_1.q.out > 083bfc301c > ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_2.q.out > a59188a46a > ql/src/test/results/clientpositive/llap/identity_project_remove_skip.q.out > b03b96b463 > ql/src/test/results/clientpositive/llap/jdbc_handler.q.out b4feb0ee1b > ql/src/test/results/clientpositive/llap/join1.q.out d79a405a41 > ql/src/test/results/clientpositive/llap/join32_lessSize.q.out c226eed126 > ql/src/test/results/clientpositive/llap/join_max_hashtable.q.out 85d45fe712 > ql/src/test/results/clientpositive/llap/limit_join_transpose.q.out > 99d8d6f664 > ql/src/test/results/clientpositive/llap/limit_pushdown.q.out 93315e6b1b > ql/src/test/results/clientpositive/llap/limit_pushdown3.q.out 351ee01b45 > ql/src/test/results/clientpositive/llap/llap_smb.q.out e3044baf52 > ql/src/test/results/clientpositive/llap/llap_stats.q.out f81ad50679 > ql/src/test/results/clientpositive/llap/llap_vector_nohybridgrace.q.out > 0e9e1207cb > ql/src/test/results/clientpositive/llap/llapdecider.q.out 69312cd6a2 > ql/src/test/results/clientpositive/llap/merge1.q.out 8021b67733 > ql/src/test/results/clientpositive/llap/merge2.q.out 7bcdd2d57e > ql/src/test/results/clientpositive/llap/mergejoin.q.out 10fb45d284 > ql/src/test/results/clientpositive/llap/mrr.q.out fe477fd815 > ql/src/test/results/clientpositive/llap/multiMapJoin2.q.out 25378d3e8b > ql/src/test/results/clientpositive/llap/offset_limit.q.out adfeb05448 > ql/src/test/results/clientpositive/llap/offset_limit_ppd_optimizer.q.out > 48e58e1bd9 > ql/src/test/results/clientpositive/llap/parallel_colstats.q.out 95ed8b813a > ql/src/test/results/clientpositive/llap/ptf.q.out fbaf1e6474 > ql/src/test/results/clientpositive/llap/ptf_streaming.q.out 6013c11c9e > ql/src/test/results/clientpositive/llap/reduce_deduplicate_extended.q.out > bc44db795a > ql/src/test/results/clientpositive/llap/semijoin_hint.q.out 76c985e727 > ql/src/test/results/clientpositive/llap/skewjoin.q.out dc79b26020 > ql/src/test/results/clientpositive/llap/subquery_exists.q.out 0749872253 > ql/src/test/results/clientpositive/llap/subquery_in.q.out 1d5f547adf > ql/src/test/results/clientpositive/llap/subquery_multi.q.out 41b8ea3f25 > ql/src/test/results/clientpositive/llap/subquery_notin.q.out c760af2f87 > ql/src/test/results/clientpositive/llap/subquery_scalar.q.out 03d5f191b6 > ql/src/test/results/clientpositive/llap/subquery_select.q.out 4917c42b0e > ql/src/test/results/clientpositive/llap/subquery_views.q.out b64e0f49c6 > ql/src/test/results/clientpositive/llap/sysdb.q.out fbbf8d9b7f > ql/src/test/results/clientpositive/llap/tez_dml.q.out 786929e7af > ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_1.q.out > 626a0640da > ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_2.q.out > 273cf06e31 > ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_3.q.out > 04e0f86fe2 > ql/src/test/results/clientpositive/llap/tez_join_hash.q.out 92a188e18a > ql/src/test/results/clientpositive/llap/tez_join_tests.q.out 1e1b63d95e > ql/src/test/results/clientpositive/llap/tez_joins_explain.q.out 5297d7ecbc > ql/src/test/results/clientpositive/llap/tez_smb_main.q.out 66d7aeca70 > ql/src/test/results/clientpositive/llap/tez_union.q.out c72b232b35 > ql/src/test/results/clientpositive/llap/tez_union2.q.out 7b45c7c719 > ql/src/test/results/clientpositive/llap/tez_union_multiinsert.q.out > 5e0d072095 > ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_1.q.out > a38e339012 > ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_2.q.out > f7c694bf84 > ql/src/test/results/clientpositive/llap/unionDistinct_1.q.out 602d57625c > ql/src/test/results/clientpositive/llap/union_top_level.q.out 268e0413cd > ql/src/test/results/clientpositive/llap/varchar_udf1.q.out 8162305df6 > ql/src/test/results/clientpositive/llap/vector_groupby_mapjoin.q.out > 16b716c4e5 > ql/src/test/results/clientpositive/llap/vector_left_outer_join.q.out > 6118c73c2b > ql/src/test/results/clientpositive/llap/vector_mapjoin_reduce.q.out > 8df5a64fee > ql/src/test/results/clientpositive/llap/vector_outer_join1.q.out d95604d722 > ql/src/test/results/clientpositive/llap/vector_outer_join2.q.out 16de760424 > ql/src/test/results/clientpositive/llap/vector_udf1.q.out 16edaacf94 > ql/src/test/results/clientpositive/llap/vectorization_0.q.out fba9c07350 > ql/src/test/results/clientpositive/llap/vectorization_8.q.out 0d5b6d53e0 > ql/src/test/results/clientpositive/llap/vectorization_short_regress.q.out > 00577620d8 > ql/src/test/results/clientpositive/llap/vectorized_distinct_gby.q.out > c3e5f7c90d > > ql/src/test/results/clientpositive/llap/vectorized_dynamic_semijoin_reduction.q.out > 9a1c44c3e6 > > ql/src/test/results/clientpositive/llap/vectorized_dynamic_semijoin_reduction2.q.out > a03466f859 > ql/src/test/results/clientpositive/llap/vectorized_mapjoin.q.out 8160bc7c44 > > ql/src/test/results/clientpositive/llap/vectorized_multi_output_select.q.out > f744eb6513 > ql/src/test/results/clientpositive/llap/vectorized_nested_mapjoin.q.out > 28a8340a9c > ql/src/test/results/clientpositive/llap/vectorized_shufflejoin.q.out > 73ab9fca82 > ql/src/test/results/clientpositive/llap/windowing_gby.q.out 2c47b8b2a6 > ql/src/test/results/clientpositive/parallel_colstats.q.out c85113137b > ql/src/test/results/clientpositive/partial_column_stats.q.out 5876efacf3 > ql/src/test/results/clientpositive/partition_coltype_literals.q.out > 3505556029 > > ql/src/test/results/clientpositive/reduceSinkDeDuplication_pRS_key_empty.q.out > 3ad09e815c > ql/src/test/results/clientpositive/remove_exprs_stats.q.out 33cf90ae9d > ql/src/test/results/clientpositive/rename_external_partition_location.q.out > ec4076f908 > ql/src/test/results/clientpositive/rename_table_update_column_stats.q.out > b3d6f039ac > ql/src/test/results/clientpositive/temp_table_display_colstats_tbllvl.q.out > bd024a7ab1 > ql/src/test/results/clientpositive/tez/explainanalyze_1.q.out 6602222ed7 > ql/src/test/results/clientpositive/tez/explainanalyze_2.q.out 0916565f0f > ql/src/test/results/clientpositive/tez/explainanalyze_3.q.out e5c8d6c51e > ql/src/test/results/clientpositive/tez/explainanalyze_4.q.out 9fbe8c5263 > ql/src/test/results/clientpositive/tez/explainanalyze_5.q.out b35e294813 > ql/src/test/results/clientpositive/tez/explainuser_3.q.out 65c9114b20 > ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_1.q.out > 0a1e039cf1 > ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_2.q.out > 6f5a3a96ca > ql/src/test/results/clientpositive/tez/vectorization_limit.q.out afcae8c34c > ql/src/test/results/clientpositive/tunable_ndv.q.out 6ae54b4927 > > > Diff: https://reviews.apache.org/r/60753/diff/3/ > > > Testing > ------- > > > Thanks, > > pengcheng xiong > >