cshuo commented on code in PR #13498:
URL: https://github.com/apache/hudi/pull/13498#discussion_r2184194503
##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/BaseSparkInternalRowReaderContext.java:
##########
@@ -110,6 +112,14 @@ public HoodieRecord<InternalRow>
constructHoodieRecord(BufferedRecord<InternalRo
return new HoodieSparkRecord(hoodieKey, row,
HoodieInternalRowUtils.getCachedSchema(schema), false);
}
+ @Override
+ public InternalRow constructEngineRecord(Schema schema, List<Object> values)
{
+ if (schema.getFields().size() != values.size()) {
+ throw new IllegalArgumentException("Schema field count and values size
must match.");
+ }
+ return new GenericInternalRow(values.toArray());
Review Comment:
> If toBinary is always called before putting the record to the spillable map
yes, toBinary is always called before record is put into the spillable map
in order to reducing the spilling.
> some benchmarking around FG reader which you can re-use
I previously performed a benchmark around the spilling of
`ExternalSpillableMap` for FG reader used in compaction with/without
`toBinary`, and I can share the doc/scripts if necessary. cc @linliu-code
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]