codope commented on a change in pull request #5077:
URL: https://github.com/apache/hudi/pull/5077#discussion_r832172618
##########
File path:
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -1042,22 +1046,27 @@ public static void aggregateColumnStats(IndexedRecord
record, Schema schema,
schema.getFields().forEach(field -> {
Map<String, Object> columnStats =
columnToStats.getOrDefault(field.name(), new HashMap<>());
- final String fieldVal = getNestedFieldValAsString((GenericRecord)
record, field.name(), true, consistentLogicalTimestampEnabled);
+ final Object fieldVal = getNestedFieldVal((GenericRecord) record,
field.name(), true, consistentLogicalTimestampEnabled);
Review comment:
> Instead we can directly access the field value and perform logical
type conversion by invoking convertValueForSpecificDataTypes directly.
> As such my proposal is to limit min/max stats only for primitive types.
Will do.
> `fieldVal` we're getting should be of type `Comparable<?>`, otherwise we
set it to null (since there's no plausible order we can impose on these
> I don't think we need this method. Instead we should make sure at the
place where we collect min/max stats that they're proper Java objects, and
simply treat min/max as Comparable
How do you get a `Comparable<?>` given the record is an avro
`GenericRecord`? I can refactor in a way such that we don't have to convert
while merging stats, and hide the conversion. Still, we need to infer the java
type from avro type and do the conversion. What I am thinking is to add another
method, say `convertValueToNativeJavaType` in `HoodieAvroUtils` which will
```
Comparable<?> convertValueToNativeJavaType(Schema fieldSchema, Object val) {
Object v = convertValueForSpecificDataTypes(fieldSchema, val)
return convertToNativeJavaType(fieldSchema, v);
}
and then in HoodieTableMetadataUtil#aggregateColumnStats
Comparable<?> fieldVal = convertValueToNativeJavaType(field.schema(),
genericRecord.get(field.name()));
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]