> On July 10, 2017, 10:02 p.m., Prasanth_J wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
> > Line 1723 (original), 1723 (patched)
> > <https://reviews.apache.org/r/60753/diff/1/?file=1773281#file1773281line1723>
> >
> >     I am not sure if we need this config. 
> >     Any reason to support storing FM sketch's bitvectors?
> >     
> >     If this is for 3.0.0 release only, we could even remove this config.
> >     
> >     If we need to support FM sketch based NDV then having a separate field 
> > in metastore will be better as Ashutosh suggested.
> 
> pengcheng xiong wrote:
>     This config is just to give user an option to choose whether to use FM or 
> HLL to COMPUTE ndv. This patch does not change the fact that we do not store 
> bit vectors for both FM and NDV in object store.
> 
> Prasanth_J wrote:
>     I don't think it will be useful for user to tune this.
>     IMHO, most users won't care a lot about this. This can be used as 
> fallback but this will only add more checks.

I agree with Prasanth. Overloading this config seems hacky. It is leading to 
confusing code, e.g, NDVEstimator uses number of bit vectors to determine ndv 
algo, thats confusing. Its confusing for end user too that +ve value from 
[0,100] is useful for one algo, but -ve value is only to switch over to diff 
algo. I suggest that we leave this config as is and introduce a new config 
which dictates which algo is used for ndv computation. This config value is 
than passed to compute_stats() udaf by ColumnStatsSemanticAnalyzer. 
GenericUDAFComputeStats than uses this config to determine which algo its 
using. 
This config we can store in metastore (either as is or mapped as int) so that 
we can deserialize the bit vectors correctly when we retrieve them.


- Ashutosh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60753/#review180119
-----------------------------------------------------------


On July 10, 2017, 9:29 p.m., pengcheng xiong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60753/
> -----------------------------------------------------------
> 
> (Updated July 10, 2017, 9:29 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-16966
> 
> 
> Diffs
> -----
> 
>   common/pom.xml e6722babd8 
>   common/src/java/org/apache/hadoop/hive/common/HiveStatsUtils.java 
> 7c9d72fbd2 
>   
> common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimator.java
>  PRE-CREATION 
>   
> common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java
>  PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLConstants.java 
> PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLDenseRegister.java 
> PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLRegister.java 
> PRE-CREATION 
>   
> common/src/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java 
> PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java 
> PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLogUtils.java 
> PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 5700fb9325 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig da48a7ccbd 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/NumDistinctValueEstimator.java
>  92f9a845e3 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DecimalColumnStatsAggregator.java
>  36b2c9c56b 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/DoubleColumnStatsAggregator.java
>  a88ef84e5c 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/LongColumnStatsAggregator.java
>  8ac6561aec 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/StringColumnStatsAggregator.java
>  2aa4046a46 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMerger.java
>  33c7e3e52c 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/ColumnStatsMergerFactory.java
>  fe890e4e27 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DateColumnStatsMerger.java
>  3179b23438 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DecimalColumnStatsMerger.java
>  c13add9d9c 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/DoubleColumnStatsMerger.java
>  fbdba24b0a 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/LongColumnStatsMerger.java
>  ac65590505 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/hbase/stats/merge/StringColumnStatsMerger.java
>  41587477d3 
>   pom.xml f9fae59a5d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DecimalNumDistinctValueEstimator.java
>  a05906edfa 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DoubleNumDistinctValueEstimator.java
>  e76fc74dbc 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java
>  2ebfcb2360 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/LongNumDistinctValueEstimator.java
>  1c197a028a 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java
>  fa70f49857 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/StringNumDistinctValueEstimator.java
>  601901c163 
>   ql/src/test/queries/clientpositive/hll.q PRE-CREATION 
>   ql/src/test/results/clientpositive/hll.q.out PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/60753/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> pengcheng xiong
> 
>

Reply via email to