Re: [PR] fix: Allow HFile Cache reader configuration to be configurable [hudi]

via GitHub Sun, 21 Sep 2025 15:03:09 -0700


yihua commented on code in PR #13927:
URL: https://github.com/apache/hudi/pull/13927#discussion_r2366410259



##########
hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroReaderContext.java:
##########
@@ -79,7 +80,24 @@ public HoodieAvroReaderContext(
       HoodieTableConfig tableConfig,
       Option<InstantRange> instantRangeOpt,
       Option<Predicate> filterOpt) {
-    this(storageConfiguration, tableConfig, instantRangeOpt, filterOpt, 
Collections.emptyMap());
+    this(storageConfiguration, tableConfig, instantRangeOpt, filterOpt, 
Collections.emptyMap(), tableConfig.getPayloadClass(), new HoodieConfig());

Review Comment:
   Should `new HoodieConfig()` be avoided, i.e., the caller should always pass 
either write config (which contains the HFile block cache and other configs on 
the write path) or a newly constructed config by getting the configs from Spark 
read options or storage/hadoop configs on the read path (there is no write 
config on the read path) to avoid misuse of the constructor?



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala:
##########
@@ -843,6 +843,15 @@ object HoodieBaseRelation extends SparkAdapterSupport {
       val hoodieConfig = new HoodieConfig()
       hoodieConfig.setValue(USE_NATIVE_HFILE_READER,
         options.getOrElse(USE_NATIVE_HFILE_READER.key(), 
USE_NATIVE_HFILE_READER.defaultValue().toString))
+
+      // Add HFile cache configurations from options
+      hoodieConfig.setValue(HFILE_BLOCK_CACHE_ENABLED,
+        options.getOrElse(HFILE_BLOCK_CACHE_ENABLED.key(), 
HFILE_BLOCK_CACHE_ENABLED.defaultValue().toString))
+      hoodieConfig.setValue(HFILE_BLOCK_CACHE_SIZE,
+        options.getOrElse(HFILE_BLOCK_CACHE_SIZE.key(), 
HFILE_BLOCK_CACHE_SIZE.defaultValue().toString))
+      hoodieConfig.setValue(HFILE_BLOCK_CACHE_TTL_MINUTES,
+        options.getOrElse(HFILE_BLOCK_CACHE_TTL_MINUTES.key(), 
HFILE_BLOCK_CACHE_TTL_MINUTES.defaultValue().toString))

Review Comment:
   Extract this to a new util for reuse by different callers on 
`HoodieIOFactory.getIOFactory(storage).getReaderFactory(HoodieRecordType.AVRO).getFileReader`?



##########
hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroReaderContext.java:
##########
@@ -152,7 +172,7 @@ public ClosableIterator<IndexedRecord> 
getFileRecordIterator(
     } else {
       HoodieFileFormat fileFormat = isMultiFormat && !isLogFile ? 
HoodieFileFormat.fromFileExtension(filePath.getFileExtension()) : 
baseFileFormat;
       reader = (HoodieAvroFileReader) HoodieIOFactory.getIOFactory(storage)
-          
.getReaderFactory(HoodieRecord.HoodieRecordType.AVRO).getFileReader(new 
HoodieConfig(),
+          
.getReaderFactory(HoodieRecord.HoodieRecordType.AVRO).getFileReader(hoodieConfig,

Review Comment:
   Should `HoodieHFileDataBlock` also be fixed?  The `HoodieHFileDataBlock` 
constructs the HFile reader config as below, which does not consider HFile 
block cache configs.  Also see if there are other 
`HoodieIOFactory.getIOFactory(storage).getReaderFactory(HoodieRecordType.AVRO).getFileReader`
 needs to be fixed on this.
   ```
   private HoodieConfig getHFileReaderConfig(boolean useNativeHFileReader) {
       HoodieConfig config = new HoodieConfig();
       config.setValue(
           HoodieReaderConfig.USE_NATIVE_HFILE_READER, 
Boolean.toString(useNativeHFileReader));
       return config;
     }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] fix: Allow HFile Cache reader configuration to be configurable [hudi]

Reply via email to