codope commented on a change in pull request #5077:
URL: https://github.com/apache/hudi/pull/5077#discussion_r832179099



##########
File path: 
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -1042,22 +1046,27 @@ public static void aggregateColumnStats(IndexedRecord 
record, Schema schema,
 
     schema.getFields().forEach(field -> {
       Map<String, Object> columnStats = 
columnToStats.getOrDefault(field.name(), new HashMap<>());
-      final String fieldVal = getNestedFieldValAsString((GenericRecord) 
record, field.name(), true, consistentLogicalTimestampEnabled);
+      final Object fieldVal = getNestedFieldVal((GenericRecord) record, 
field.name(), true, consistentLogicalTimestampEnabled);
+      final Schema fieldSchema = 
getNestedFieldSchemaFromRecord((GenericRecord) record, field.name());
       // update stats
-      final int fieldSize = fieldVal == null ? 0 : fieldVal.length();
+      final int fieldSize = fieldVal == null ? 0 : 
StringUtils.objToString(fieldVal).length();

Review comment:
       This method is used for meging colstats when handing updates for 
deltacommits. In this case, the records are encoded in avro by design.
   I agree with your point about size. While there is no immediate use-case for 
size, it might be helpful in future for e.g. skewness. Since we have the right 
size for base files, do you think we can set it to 0 here for now?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to