codope commented on code in PR #18826:
URL: https://github.com/apache/hudi/pull/18826#discussion_r3292775688
##########
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/functional/TestHoodieBackedMetadata.java:
##########
@@ -4379,6 +4379,104 @@ private void changeTableVersion(HoodieTableVersion
version) throws IOException {
}
}
+ /**
+ * Validates that RLI initialization estimates file group count from base
file footer metadata
+ * (instead of materializing and counting records) when min != max file
group count.
+ */
+ @ParameterizedTest
+ @EnumSource(HoodieTableType.class)
+ public void testRecordIndexFileGroupEstimation(HoodieTableType tableType)
throws Exception {
Review Comment:
Both this and `testRecordIndexWithFixedFileGroupCount` would pass even if
`estimateRecordCountFromBaseFiles` returned 0 right? With 200 inserts, default
maxFileGroupSizeBytes` and min=1/max=10, the estimator would return
minFileGroupCount=1 regardless. The assertion is satisfied. But, i think we
should configure a small `record.index.max.file.group.size.bytes` e.g. a few KB
and/or a low average record size so the count actually drives the result. Wdyt?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]