nuno-faria commented on code in PR #16971:
URL: https://github.com/apache/datafusion/pull/16971#discussion_r2246052908
##########
datafusion/execution/src/cache/cache_manager.rs:
##########
@@ -86,6 +114,10 @@ pub struct CacheManagerConfig {
/// location.
/// Default is disable.
pub list_files_cache: Option<ListFilesCache>,
+ /// Cache of file-embedded metadata, used to avoid reading it multiple
times when processing a
+ /// data file (e.g., Parquet footer and page metadata).
+ /// If not provided, the [`CacheManager`] will create a
[`DefaultFilesMetadataCache`].
+ pub file_metadata_cache: Option<FileMetadataCache>,
Review Comment:
My initial idea here was to make it easy to enable the metadata cache
without having to provide a custom `FileMetadataCache` when setting up the
runtime (default). This way, the user can simply call `set
datafusion.execution.parquet.cache_metadata = true;` or enable for a file with
the `ParquetReadOptions`. But I don't know if there is a better approach (maybe
removing the `Option` for the `file_metadata_cache` altogether?).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]