okumin commented on code in PR #6244:
URL: https://github.com/apache/hive/pull/6244#discussion_r2781267101
##########
ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java:
##########
@@ -2086,11 +2088,7 @@ private static List<Long>
extractNDVGroupingColumns(List<ColStatistics> colStats
// compute product of distinct values of grouping columns
for (ColStatistics cs : colStats) {
if (cs != null) {
- long ndv = cs.getCountDistint();
- if (cs.getNumNulls() > 0) {
- ndv = StatsUtils.safeAdd(ndv, 1);
- }
- ndvValues.add(ndv);
+ ndvValues.add(getGroupingColumnNdv(cs, parentStats));
Review Comment:
After taking a glance at all test files, I started feeling I would like to
separate unrelated changes, like below.
- HIVE-29368: UDF changes
- HIVE-XXXXX: `cs.setCountDistint(csd.getTimestampStats().getNumDVs())` and
similar changes
- HIVE-XXXXX: `getGroupingColumnNdv` and related changed
This is because I can review each of them in 30 minutes if they are
separated, so I will spend only 90 minutes in total. If all are included, it is
not very obvious why each test case has changed. I need more focus, and we
can't make a checkpoint because we can't merge it unless all changes are
reasonable and all test cases are green (I know some integration tests are
still failing and Sonar Cloud is reporting some remaining issues). This
proposal is negotiable because it requires your efforts. I should have proposed
it at the beginning.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]