Re: [PR] HIVE-29368: more conservative NDV combining by PessimisticStatCombiner [hive]

via GitHub Mon, 09 Feb 2026 00:29:59 -0800


okumin commented on code in PR #6244:
URL: https://github.com/apache/hive/pull/6244#discussion_r2781267101



##########
ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java:
##########
@@ -2086,11 +2088,7 @@ private static List<Long> 
extractNDVGroupingColumns(List<ColStatistics> colStats
     // compute product of distinct values of grouping columns
     for (ColStatistics cs : colStats) {
       if (cs != null) {
-        long ndv = cs.getCountDistint();
-        if (cs.getNumNulls() > 0) {
-          ndv = StatsUtils.safeAdd(ndv, 1);
-        }
-        ndvValues.add(ndv);
+        ndvValues.add(getGroupingColumnNdv(cs, parentStats));

Review Comment:
   After taking a glance at all test files, I started feeling I would like to 
separate unrelated changes, like below.
   - HIVE-29368: UDF changes
   - HIVE-XXXXX: `cs.setCountDistint(csd.getTimestampStats().getNumDVs())` and 
similar changes
   - HIVE-XXXXX: `getGroupingColumnNdv` and related changed
   
   This is because I can review each of them in 30 minutes if they are 
separated, so I will spend only 90 minutes in total. If all are included, it is 
not very obvious why each test case has changed. I need more focus, and we 
can't make a checkpoint because we can't merge it unless all changes are 
reasonable and all test cases are green (I know some integration tests are 
still failing and Sonar Cloud is reporting some remaining issues). This 
proposal is negotiable because it requires your efforts. I should have proposed 
it at the beginning.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HIVE-29368: more conservative NDV combining by PessimisticStatCombiner [hive]

Reply via email to