cshuo commented on code in PR #13411:
URL: https://github.com/apache/hudi/pull/13411#discussion_r2139374889
##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java:
##########
@@ -376,19 +377,24 @@ public Option<IndexedRecord>
combineAndGetUpdateValue(IndexedRecord oldRecord, S
}
@Override
- public Option<IndexedRecord> getInsertValue(Schema schemaIgnored, Properties
propertiesIgnored) throws IOException {
+ public Option<IndexedRecord> getInsertValue(Schema schema, Properties
propertiesIgnored) throws IOException {
if (key == null || this.isDeletedRecord) {
return Option.empty();
}
HoodieMetadataRecord record = new HoodieMetadataRecord(key, type,
filesystemMetadata, bloomFilterMetadata,
columnStatMetadata, recordIndexMetadata, secondaryIndexMetadata);
- return Option.of(record);
+ if (schema == null ||
HoodieMetadataRecord.getClassSchema().equals(schema)) {
+ // If the schema is same or none is provided, we can return the record
directly
+ return Option.of(record);
+ } else {
+ return Option.of(rewriteRecord(record, schema));
Review Comment:
Yes, schema of merged result from two MDT log file records is always
`HoodieMetadataRecord#SCHEMA$`, there is a potential problem that the records
in the final iterator can have different schemas.
IIUC, there is problem using `rewriteRecord(record, schema)`, since `record`
here is `HoodieMetadataRecord` which is instance of `SpecificRecordBase`, and
rewriting will skip metadata fields in new schema. See details in
https://github.com/apache/hudi/blob/5146dc81f493b4daa06ee4beee21a4f045ad5718/hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java#L953
Is the skipping as expected?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]