[ https://issues.apache.org/jira/browse/HIVE-27000?focusedWorklogId=842453&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-842453 ]
ASF GitHub Bot logged work on HIVE-27000: ----------------------------------------- Author: ASF GitHub Bot Created on: 31/Jan/23 08:14 Start Date: 31/Jan/23 08:14 Worklog Time Spent: 10m Work Description: akshat0395 commented on code in PR #3997: URL: https://github.com/apache/hive/pull/3997#discussion_r1091578039 ########## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/columnstats/merge/DateColumnStatsMerger.java: ########## @@ -43,64 +46,57 @@ public void merge(ColumnStatisticsObj aggregateColStats, ColumnStatisticsObj new DateColumnStatsDataInspector aggregateData = dateInspectorFromStats(aggregateColStats); DateColumnStatsDataInspector newData = dateInspectorFromStats(newColStats); - setLowValue(aggregateData, newData); - setHighValue(aggregateData, newData); - - aggregateData.setNumNulls(aggregateData.getNumNulls() + newData.getNumNulls()); - if (aggregateData.getNdvEstimator() == null || newData.getNdvEstimator() == null) { - aggregateData.setNumDVs(Math.max(aggregateData.getNumDVs(), newData.getNumDVs())); - } else { - NumDistinctValueEstimator oldEst = aggregateData.getNdvEstimator(); - NumDistinctValueEstimator newEst = newData.getNdvEstimator(); - final long ndv; - if (oldEst.canMerge(newEst)) { - oldEst.mergeEstimators(newEst); - ndv = oldEst.estimateNumDistinctValues(); - aggregateData.setNdvEstimator(oldEst); - } else { - ndv = Math.max(aggregateData.getNumDVs(), newData.getNumDVs()); - } - LOG.debug("Use bitvector to merge column {}'s ndvs of {} and {} to be {}", aggregateColStats.getColName(), - aggregateData.getNumDVs(), newData.getNumDVs(), ndv); - aggregateData.setNumDVs(ndv); + Date lowValue = mergeLowValue(getLowValue(aggregateData), getLowValue(newData)); + if (lowValue != null) { + aggregateData.setLowValue(lowValue); + } + Date highValue = mergeHighValue(getHighValue(aggregateData), getHighValue(newData)); + if (highValue != null) { Review Comment: This code block seems to be duplicate for multiple StatsMerger classes, can we move this to a shared class to avoid duplication. Issue Time Tracking ------------------- Worklog Id: (was: 842453) Time Spent: 40m (was: 0.5h) > Improve the modularity of the *ColumnStatsMerger classes > -------------------------------------------------------- > > Key: HIVE-27000 > URL: https://issues.apache.org/jira/browse/HIVE-27000 > Project: Hive > Issue Type: Improvement > Components: Statistics > Affects Versions: 4.0.0-alpha-2 > Reporter: Alessandro Solimando > Assignee: Alessandro Solimando > Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > *ColumnStatsMerger classes contain a lot of duplicate code which is not > specific to the data type, and that could therefore be lifted to a common > parent class. > This phenomenon is bound to become even worse if we keep enriching further > our supported set of statistics as we did in the context of HIVE-26221. > The current ticket aims at improving the modularity and code reuse of the > *ColumnStatsMerger classes, while improving unit-test coverage to cover all > classes and support more use-cases. -- This message was sent by Atlassian Jira (v8.20.10#820010)