vamshikrishnakyatham commented on code in PR #13862:
URL: https://github.com/apache/hudi/pull/13862#discussion_r2331767489


##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieAvroDataBlock.java:
##########
@@ -108,24 +113,32 @@ protected ByteArrayOutputStream 
serializeRecords(List<HoodieRecord> records, Hoo
       // 1. Write out the log block version
       output.writeInt(HoodieLogBlock.version);
 
-      // 2. Write total number of records
-      output.writeInt(records.size());
-
-      // 3. Write the records
+      // 2. Pre-serialize records to handle and get accurate count
       Properties props = initProperties(storage.getConf());
+      List<ByteArrayOutputStream> serializedRecords = new ArrayList<>();
       for (HoodieRecord<?> s : records) {
         try {
           // Encode the record into bytes
           // Spark Record not support write avro log
           ByteArrayOutputStream data = s.getAvroBytes(schema, props);
-          // Write the record size
-          output.writeInt(data.size());
-          // Write the content
-          data.writeTo(output);
+          serializedRecords.add(data);
         } catch (IOException e) {
           throw new HoodieIOException("IOException converting 
HoodieAvroDataBlock to bytes", e);
+        } catch (Exception e) {
+          LOG.warn("Skipping record during serialization: {}. This may be due 
to concurrent archiving race conditions. "

Review Comment:
   I think HoodieCommitMetadata may have null fields because of silent data 
skipping, not sure, will add some loggings and check on CI. Will create a new 
task on Jira for this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to