> On July 14, 2017, 12:55 a.m., Ashutosh Chauhan wrote:
> > common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java
> > Lines 28-30 (patched)
> > <https://reviews.apache.org/r/60753/diff/2/?file=1775588#file1775588line28>
> >
> >     Is it really worth to have a dependency on 3rd party jar for this 
> > class. Will using TreeMap be really slow?
> >     Javolution library on which we already have a dependcy also provides 
> > these fast data structures. If really needed, we shall use those instead.

The original dependency was added to a have pretty flat treemap + list. Fast 
path inserts go to list which will get sorted and merged to treemap. Fastutil 
has primitive hash maps/lists with minimal object allocations. 
Since we are using HLL in the context of ANALYZE I think this will not have a 
huge impact in terms performance. Also the threshold to switch between SPARSE 
to DENSE is pretty low (in the order of 1000s), object allocations in inner 
loop will not be significant. 

I agree with Ashutosh, we can remove this runtime dependency as it will not 
have big impact. Replacing with java Treemap should work pretty well. If using 
Treemap becomes a bottleneck we can replace it later on in a follow up jira.

Also I would recommend having these JUnit tests as well to make sure Treemap 
replacement goes well
https://github.com/prasanthj/hyperloglog/tree/master/src/test/com/github/prasanthj/hll


- Prasanth_J


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60753/#review180494
-----------------------------------------------------------


On July 14, 2017, 12:08 a.m., pengcheng xiong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60753/
> -----------------------------------------------------------
> 
> (Updated July 14, 2017, 12:08 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-16966
> 
> 
> Diffs
> -----
> 
>   common/pom.xml 023f084511 
>   
> common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java
>  PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLConstants.java 
> PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLDenseRegister.java 
> PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLRegister.java 
> PRE-CREATION 
>   
> common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java 
> PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java 
> PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLogUtils.java 
> PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java c7afe2bc4a 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/NumDistinctValueEstimator.java
>  92f9a845e3 
>   metastore/src/java/org/apache/hadoop/hive/metastore/hbase/StatsCache.java 
> 18f8afc9ad 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/ColumnStatsAggregator.java
>  31955b4363 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/ColumnStatsAggregatorFactory.java
>  daf85692eb 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DecimalColumnStatsAggregator.java
>  36b2c9c56b 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DoubleColumnStatsAggregator.java
>  a88ef84e5c 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/LongColumnStatsAggregator.java
>  8ac6561aec 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/StringColumnStatsAggregator.java
>  2aa4046a46 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMerger.java
>  33c7e3e52c 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMergerFactory.java
>  fe890e4e27 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DateColumnStatsMerger.java
>  3179b23438 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DecimalColumnStatsMerger.java
>  c13add9d9c 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DoubleColumnStatsMerger.java
>  fbdba24b0a 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/LongColumnStatsMerger.java
>  ac65590505 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/StringColumnStatsMerger.java
>  41587477d3 
>   
> metastore/src/test/org/apache/hadoop/hive/metastore/hbase/TestHBaseAggregateStatsNDVUniformDist.java
>  87b1ac870d 
>   pom.xml 32f5fd1493 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 
> 0a5cf00c44 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java 76f7daeb1b 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DecimalNumDistinctValueEstimator.java
>  a05906edfa 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DoubleNumDistinctValueEstimator.java
>  e76fc74dbc 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java
>  2ebfcb2360 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/LongNumDistinctValueEstimator.java
>  1c197a028a 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java
>  fa70f49857 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/StringNumDistinctValueEstimator.java
>  601901c163 
>   ql/src/test/queries/clientpositive/char_udf1.q 39aa0e0e17 
>   ql/src/test/queries/clientpositive/compute_stats_date.q 09128f6fb9 
>   ql/src/test/queries/clientpositive/compute_stats_decimal.q 76e1468ada 
>   ql/src/test/queries/clientpositive/compute_stats_double.q 7a1e0f6295 
>   ql/src/test/queries/clientpositive/compute_stats_long.q 6a2070f780 
>   ql/src/test/queries/clientpositive/compute_stats_string.q 0023e7f6bd 
>   ql/src/test/queries/clientpositive/hll.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/reduceSinkDeDuplication_pRS_key_empty.q 
> 8bbae3914d 
>   ql/src/test/queries/clientpositive/varchar_udf1.q 4d1f884ea7 
>   ql/src/test/queries/clientpositive/vector_udf1.q 48d3e1ee4d 
>   ql/src/test/results/clientpositive/alter_partition_update_status.q.out 
> 922822e6d2 
>   ql/src/test/results/clientpositive/alter_table_column_stats.q.out 
> 2cc7cbc7b6 
>   ql/src/test/results/clientpositive/alter_table_update_status.q.out 
> e26e8cba1c 
>   ql/src/test/results/clientpositive/analyze_tbl_part.q.out ed90b6fc92 
>   ql/src/test/results/clientpositive/annotate_stats_deep_filters.q.out 
> 95dd6abaec 
>   ql/src/test/results/clientpositive/annotate_stats_groupby.q.out a8e4854a00 
>   ql/src/test/results/clientpositive/annotate_stats_join.q.out c1a140b558 
>   ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out 
> f7d73c9ddf 
>   ql/src/test/results/clientpositive/autoColumnStats_4.q.out fe3b9e53ef 
>   ql/src/test/results/clientpositive/autoColumnStats_5.q.out e19fb5f504 
>   ql/src/test/results/clientpositive/autoColumnStats_6.q.out 29b3373e10 
>   ql/src/test/results/clientpositive/autoColumnStats_7.q.out 9d24bc53ab 
>   ql/src/test/results/clientpositive/autoColumnStats_8.q.out 681d962ed0 
>   ql/src/test/results/clientpositive/autoColumnStats_9.q.out d26e2c02b7 
>   ql/src/test/results/clientpositive/auto_join_without_localtask.q.out 
> 17a912ec13 
>   ql/src/test/results/clientpositive/avro_decimal.q.out 5a3b72defe 
>   ql/src/test/results/clientpositive/avro_decimal_native.q.out fe77512191 
>   ql/src/test/results/clientpositive/cbo_rp_annotate_stats_groupby.q.out 
> f260f034b6 
>   ql/src/test/results/clientpositive/cbo_rp_join0.q.out b9cf3ceab4 
>   ql/src/test/results/clientpositive/char_udf1.q.out 07ce108a75 
>   ql/src/test/results/clientpositive/colstats_all_nulls.q.out 14c5d5b59b 
>   ql/src/test/results/clientpositive/column_pruner_multiple_children.q.out 
> 96feeed49c 
>   ql/src/test/results/clientpositive/columnstats_partlvl.q.out 07d26e92bb 
>   ql/src/test/results/clientpositive/columnstats_partlvl_dp.q.out 468d2e797b 
>   ql/src/test/results/clientpositive/columnstats_quoting.q.out 52e35385a1 
>   ql/src/test/results/clientpositive/columnstats_tbllvl.q.out 462d4c1771 
>   ql/src/test/results/clientpositive/compute_stats_date.q.out c2472377a8 
>   ql/src/test/results/clientpositive/compute_stats_decimal.q.out e0584c50a8 
>   ql/src/test/results/clientpositive/compute_stats_double.q.out 5b921735f0 
>   ql/src/test/results/clientpositive/compute_stats_long.q.out 119d1731cc 
>   ql/src/test/results/clientpositive/compute_stats_string.q.out 8c40490bc0 
>   ql/src/test/results/clientpositive/confirm_initial_tbl_stats.q.out 
> faa14ba9c5 
>   ql/src/test/results/clientpositive/constant_prop_2.q.out 24be5188e2 
>   ql/src/test/results/clientpositive/correlated_join_keys.q.out ec5d008728 
>   ql/src/test/results/clientpositive/cross_join_merge.q.out f4956ded22 
>   ql/src/test/results/clientpositive/decimal_stats.q.out 5d86866e2a 
>   ql/src/test/results/clientpositive/describe_table.q.out 7869494252 
>   ql/src/test/results/clientpositive/display_colstats_tbllvl.q.out a4b18d7cec 
>   ql/src/test/results/clientpositive/encrypted/encryption_move_tbl.q.out 
> 06f8408327 
>   ql/src/test/results/clientpositive/exec_parallel_column_stats.q.out 
> f256ec11bf 
>   ql/src/test/results/clientpositive/hll.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/infer_bucket_sort.q.out 1aea388815 
>   ql/src/test/results/clientpositive/join32.q.out a191284aca 
>   ql/src/test/results/clientpositive/join33.q.out a191284aca 
>   ql/src/test/results/clientpositive/join_parse.q.out 17733acdc3 
>   ql/src/test/results/clientpositive/llap/autoColumnStats_2.q.out ec209b2ef1 
>   ql/src/test/results/clientpositive/llap/auto_join1.q.out 6a0a1d5d09 
>   ql/src/test/results/clientpositive/llap/auto_join21.q.out 25cd67e7f5 
>   ql/src/test/results/clientpositive/llap/auto_join29.q.out f693ce4512 
>   ql/src/test/results/clientpositive/llap/auto_join30.q.out 91a80127a9 
>   ql/src/test/results/clientpositive/llap/cluster.q.out 2fa976b6d5 
>   ql/src/test/results/clientpositive/llap/column_table_stats.q.out fb04ee8cf9 
>   ql/src/test/results/clientpositive/llap/column_table_stats_orc.q.out 
> d55cf30331 
>   ql/src/test/results/clientpositive/llap/columnstats_part_coltype.q.out 
> dc50fb7fc1 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer1.q.out 
> fab5c9c029 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer2.q.out 
> ee645410b9 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer3.q.out 
> f1af97d98d 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer6.q.out 
> 9b71d3ec82 
>   ql/src/test/results/clientpositive/llap/count_dist_rewrite.q.out 6d411365b8 
>   ql/src/test/results/clientpositive/llap/cross_join.q.out ae3f9bf6f5 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 
> 7fd7e20e15 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_user_level.q.out 
> c4b18b7118 
>   ql/src/test/results/clientpositive/llap/dynpart_sort_optimization2.q.out 
> d4811d64d7 
>   ql/src/test/results/clientpositive/llap/explainuser_1.q.out 87b8cae445 
>   ql/src/test/results/clientpositive/llap/explainuser_2.q.out f8a6526c67 
>   ql/src/test/results/clientpositive/llap/explainuser_4.q.out 95d13216e7 
>   
> ql/src/test/results/clientpositive/llap/extrapolate_part_stats_partial_ndv.q.out
>  d97223c9d0 
>   ql/src/test/results/clientpositive/llap/filter_union.q.out c4af31715c 
>   ql/src/test/results/clientpositive/llap/groupby1.q.out 0eecbb6f4e 
>   ql/src/test/results/clientpositive/llap/groupby2.q.out 29b85d1f44 
>   ql/src/test/results/clientpositive/llap/groupby_resolution.q.out f2a6ab05ac 
>   ql/src/test/results/clientpositive/llap/having.q.out 267254c0de 
>   ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_1.q.out 
> 083bfc301c 
>   ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_2.q.out 
> a59188a46a 
>   ql/src/test/results/clientpositive/llap/identity_project_remove_skip.q.out 
> b03b96b463 
>   ql/src/test/results/clientpositive/llap/jdbc_handler.q.out b4feb0ee1b 
>   ql/src/test/results/clientpositive/llap/join1.q.out d79a405a41 
>   ql/src/test/results/clientpositive/llap/join32_lessSize.q.out c226eed126 
>   ql/src/test/results/clientpositive/llap/join_max_hashtable.q.out 85d45fe712 
>   ql/src/test/results/clientpositive/llap/limit_join_transpose.q.out 
> 99d8d6f664 
>   ql/src/test/results/clientpositive/llap/limit_pushdown.q.out 93315e6b1b 
>   ql/src/test/results/clientpositive/llap/limit_pushdown3.q.out 351ee01b45 
>   ql/src/test/results/clientpositive/llap/llap_smb.q.out e3044baf52 
>   ql/src/test/results/clientpositive/llap/llap_stats.q.out f81ad50679 
>   ql/src/test/results/clientpositive/llap/llap_vector_nohybridgrace.q.out 
> 0e9e1207cb 
>   ql/src/test/results/clientpositive/llap/llapdecider.q.out 69312cd6a2 
>   ql/src/test/results/clientpositive/llap/merge1.q.out 8021b67733 
>   ql/src/test/results/clientpositive/llap/merge2.q.out 7bcdd2d57e 
>   ql/src/test/results/clientpositive/llap/mergejoin.q.out 10fb45d284 
>   ql/src/test/results/clientpositive/llap/mrr.q.out fe477fd815 
>   ql/src/test/results/clientpositive/llap/multiMapJoin2.q.out 25378d3e8b 
>   ql/src/test/results/clientpositive/llap/offset_limit.q.out adfeb05448 
>   ql/src/test/results/clientpositive/llap/offset_limit_ppd_optimizer.q.out 
> 48e58e1bd9 
>   ql/src/test/results/clientpositive/llap/parallel_colstats.q.out 95ed8b813a 
>   ql/src/test/results/clientpositive/llap/ptf.q.out fbaf1e6474 
>   ql/src/test/results/clientpositive/llap/ptf_streaming.q.out 6013c11c9e 
>   ql/src/test/results/clientpositive/llap/reduce_deduplicate_extended.q.out 
> bc44db795a 
>   ql/src/test/results/clientpositive/llap/semijoin_hint.q.out 76c985e727 
>   ql/src/test/results/clientpositive/llap/skewjoin.q.out dc79b26020 
>   ql/src/test/results/clientpositive/llap/subquery_exists.q.out 0749872253 
>   ql/src/test/results/clientpositive/llap/subquery_in.q.out 1d5f547adf 
>   ql/src/test/results/clientpositive/llap/subquery_multi.q.out 41b8ea3f25 
>   ql/src/test/results/clientpositive/llap/subquery_notin.q.out c760af2f87 
>   ql/src/test/results/clientpositive/llap/subquery_scalar.q.out 03d5f191b6 
>   ql/src/test/results/clientpositive/llap/subquery_select.q.out 4917c42b0e 
>   ql/src/test/results/clientpositive/llap/subquery_views.q.out b64e0f49c6 
>   ql/src/test/results/clientpositive/llap/sysdb.q.out fbbf8d9b7f 
>   ql/src/test/results/clientpositive/llap/tez_dml.q.out 786929e7af 
>   ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_1.q.out 
> 626a0640da 
>   ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_2.q.out 
> 273cf06e31 
>   ql/src/test/results/clientpositive/llap/tez_dynpart_hashjoin_3.q.out 
> 04e0f86fe2 
>   ql/src/test/results/clientpositive/llap/tez_join_hash.q.out 92a188e18a 
>   ql/src/test/results/clientpositive/llap/tez_join_tests.q.out 1e1b63d95e 
>   ql/src/test/results/clientpositive/llap/tez_joins_explain.q.out 5297d7ecbc 
>   ql/src/test/results/clientpositive/llap/tez_smb_main.q.out 66d7aeca70 
>   ql/src/test/results/clientpositive/llap/tez_union.q.out c72b232b35 
>   ql/src/test/results/clientpositive/llap/tez_union2.q.out 7b45c7c719 
>   ql/src/test/results/clientpositive/llap/tez_union_multiinsert.q.out 
> 5e0d072095 
>   ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_1.q.out 
> a38e339012 
>   ql/src/test/results/clientpositive/llap/tez_vector_dynpart_hashjoin_2.q.out 
> f7c694bf84 
>   ql/src/test/results/clientpositive/llap/unionDistinct_1.q.out 602d57625c 
>   ql/src/test/results/clientpositive/llap/union_top_level.q.out 268e0413cd 
>   ql/src/test/results/clientpositive/llap/varchar_udf1.q.out 8162305df6 
>   ql/src/test/results/clientpositive/llap/vector_groupby_mapjoin.q.out 
> 16b716c4e5 
>   ql/src/test/results/clientpositive/llap/vector_left_outer_join.q.out 
> 6118c73c2b 
>   ql/src/test/results/clientpositive/llap/vector_mapjoin_reduce.q.out 
> 8df5a64fee 
>   ql/src/test/results/clientpositive/llap/vector_outer_join1.q.out d95604d722 
>   ql/src/test/results/clientpositive/llap/vector_outer_join2.q.out 16de760424 
>   ql/src/test/results/clientpositive/llap/vector_udf1.q.out 16edaacf94 
>   ql/src/test/results/clientpositive/llap/vectorization_0.q.out fba9c07350 
>   ql/src/test/results/clientpositive/llap/vectorization_8.q.out 0d5b6d53e0 
>   ql/src/test/results/clientpositive/llap/vectorization_short_regress.q.out 
> 00577620d8 
>   ql/src/test/results/clientpositive/llap/vectorized_distinct_gby.q.out 
> c3e5f7c90d 
>   
> ql/src/test/results/clientpositive/llap/vectorized_dynamic_semijoin_reduction.q.out
>  9a1c44c3e6 
>   
> ql/src/test/results/clientpositive/llap/vectorized_dynamic_semijoin_reduction2.q.out
>  a03466f859 
>   ql/src/test/results/clientpositive/llap/vectorized_mapjoin.q.out 8160bc7c44 
>   
> ql/src/test/results/clientpositive/llap/vectorized_multi_output_select.q.out 
> f744eb6513 
>   ql/src/test/results/clientpositive/llap/vectorized_nested_mapjoin.q.out 
> 28a8340a9c 
>   ql/src/test/results/clientpositive/llap/vectorized_shufflejoin.q.out 
> 73ab9fca82 
>   ql/src/test/results/clientpositive/llap/windowing_gby.q.out 2c47b8b2a6 
>   ql/src/test/results/clientpositive/parallel_colstats.q.out c85113137b 
>   ql/src/test/results/clientpositive/partial_column_stats.q.out 5876efacf3 
>   ql/src/test/results/clientpositive/partition_coltype_literals.q.out 
> 3505556029 
>   
> ql/src/test/results/clientpositive/reduceSinkDeDuplication_pRS_key_empty.q.out
>  3ad09e815c 
>   ql/src/test/results/clientpositive/remove_exprs_stats.q.out 33cf90ae9d 
>   ql/src/test/results/clientpositive/rename_external_partition_location.q.out 
> ec4076f908 
>   ql/src/test/results/clientpositive/rename_table_update_column_stats.q.out 
> b3d6f039ac 
>   ql/src/test/results/clientpositive/temp_table_display_colstats_tbllvl.q.out 
> bd024a7ab1 
>   ql/src/test/results/clientpositive/tez/explainanalyze_1.q.out 6602222ed7 
>   ql/src/test/results/clientpositive/tez/explainanalyze_2.q.out 0916565f0f 
>   ql/src/test/results/clientpositive/tez/explainanalyze_3.q.out e5c8d6c51e 
>   ql/src/test/results/clientpositive/tez/explainanalyze_4.q.out 9fbe8c5263 
>   ql/src/test/results/clientpositive/tez/explainanalyze_5.q.out b35e294813 
>   ql/src/test/results/clientpositive/tez/explainuser_3.q.out 65c9114b20 
>   ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_1.q.out 
> 0a1e039cf1 
>   ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_2.q.out 
> 6f5a3a96ca 
>   ql/src/test/results/clientpositive/tez/vectorization_limit.q.out afcae8c34c 
>   ql/src/test/results/clientpositive/tunable_ndv.q.out 6ae54b4927 
> 
> 
> Diff: https://reviews.apache.org/r/60753/diff/3/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> pengcheng xiong
> 
>

Reply via email to