cshuo commented on code in PR #13408:
URL: https://github.com/apache/hudi/pull/13408#discussion_r2139076809


##########
hudi-common/src/main/java/org/apache/hudi/common/engine/HoodieReaderContext.java:
##########
@@ -183,6 +187,14 @@ public Option<Predicate> getKeyFilterOpt() {
     return keyFilterOpt;
   }
 
+  public SizeEstimator<BufferedRecord<T>> getRecordSizeEstimator() {
+    return new HoodieRecordSizeEstimator<>(schemaHandler.getRequiredSchema());
+  }
+
+  public CustomSerializer<BufferedRecord<T>> getRecordSerializer() {
+    return new DefaultSerializer<>();

Review Comment:
   I've run a local micro benchmark comparing `DefaultSerializer` and 
`BufferedRecordSerializer(DefaultRecordSerializer)`, with 1 million flink 
`GenericRowData` records.
   
   ```
   public class DefaultRecordSerializer<T> implements RecordSerializer<T>{
     @Override
     public byte[] serialize(T record) {
       try {
         return SerializationUtils.serialize(record);
       } catch (IOException e) {
         throw new RuntimeException(e);
       }
     }
   
     @Override
     public T deserialize(byte[] bytes, int schemaId) {
       return SerializationUtils.deserialize(bytes);
     }
   }
   ```
   
   records:
   ```
         GenericRowData record = new GenericRowData(5);
         record.setField(0, "lily");
         record.setField(1, 23);
         record.setField(2, "shanghai");
         record.setField(3, 1000L);
         record.setField(4, "20240101");
         BufferedRecord<GenericRowData> bufferedRecord = new 
BufferedRecord<>("lily", 1000L, record, 1, false);
   ```
   
   Results:
   ~~Legacy default: 1439s~~
   Legacy default: 1082s
   Legacy default: 982s
   Legacy default: 969s
   Legacy default: 958s
   Avg: 997
   
   ~~New: 1164s~~
   New: 1164s
   New: 1144s
   New: 1155s
   New: 1165s
   Avg: 1157
   
   Seems the legacy `DefaultSerializer` performs little better, so we maybe 
keep `DefaultSerializer` as default until we   implement efficient 
`RecordSerializer` for other engine-specific rows.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to