Re: [PR] feat(metadata): optimize RLI bootstrap by sizing file groups from base file footer row counts [hudi]

via GitHub Sat, 23 May 2026 05:39:09 -0700


codope commented on code in PR #18826:
URL: https://github.com/apache/hudi/pull/18826#discussion_r3292775688



##########
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/functional/TestHoodieBackedMetadata.java:
##########
@@ -4379,6 +4379,104 @@ private void changeTableVersion(HoodieTableVersion 
version) throws IOException {
     }
   }
 
+  /**
+   * Validates that RLI initialization estimates file group count from base 
file footer metadata
+   * (instead of materializing and counting records) when min != max file 
group count.
+   */
+  @ParameterizedTest
+  @EnumSource(HoodieTableType.class)
+  public void testRecordIndexFileGroupEstimation(HoodieTableType tableType) 
throws Exception {

Review Comment:
   Both this and `testRecordIndexWithFixedFileGroupCount` would pass even if 
`estimateRecordCountFromBaseFiles` returned 0 right? With 200 inserts, default 
maxFileGroupSizeBytes` and min=1/max=10, the estimator would return 
minFileGroupCount=1 regardless. The assertion is satisfied. But, i think we 
should configure a small `record.index.max.file.group.size.bytes` e.g. a few KB 
and/or a low average record size so the count actually drives the result. Wdyt?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(metadata): optimize RLI bootstrap by sizing file groups from base file footer row counts [hudi]

Reply via email to