codope commented on a change in pull request #5077:
URL: https://github.com/apache/hudi/pull/5077#discussion_r832179099
##########
File path:
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -1042,22 +1046,27 @@ public static void aggregateColumnStats(IndexedRecord
record, Schema schema,
schema.getFields().forEach(field -> {
Map<String, Object> columnStats =
columnToStats.getOrDefault(field.name(), new HashMap<>());
- final String fieldVal = getNestedFieldValAsString((GenericRecord)
record, field.name(), true, consistentLogicalTimestampEnabled);
+ final Object fieldVal = getNestedFieldVal((GenericRecord) record,
field.name(), true, consistentLogicalTimestampEnabled);
+ final Schema fieldSchema =
getNestedFieldSchemaFromRecord((GenericRecord) record, field.name());
// update stats
- final int fieldSize = fieldVal == null ? 0 : fieldVal.length();
+ final int fieldSize = fieldVal == null ? 0 :
StringUtils.objToString(fieldVal).length();
Review comment:
This method is used for meging colstats when handing updates for
deltacommits. In this case, the records are encoded in avro by design.
I agree with your point about size. While there is no immediate use-case for
size, it might be helpful in future for e.g. skewness. Since we have the right
size for base files, do you think we can set it to 0 here for now?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]