adriangb commented on code in PR #19433:
URL: https://github.com/apache/datafusion/pull/19433#discussion_r2656591729


##########
datafusion/execution/src/cache/cache_unit.rs:
##########
@@ -31,14 +32,54 @@ pub use crate::cache::DefaultFilesMetadataCache;
 
 /// Default implementation of [`FileStatisticsCache`]
 ///
-/// Stores collected statistics for files
+/// Stores collected statistics and file orderings for files.
 ///
-/// Cache is invalided when file size or last modification has changed
+/// Cache is invalidated when file size or last modification has changed.
 ///
 /// [`FileStatisticsCache`]: crate::cache::cache_manager::FileStatisticsCache
 #[derive(Default)]
 pub struct DefaultFileStatisticsCache {
     statistics: DashMap<Path, (ObjectMeta, Arc<Statistics>)>,
+    /// Cached file orderings, keyed by file path.
+    /// Stored separately from statistics to maintain backwards compatibility
+    /// with the FileStatisticsCache trait interface.
+    orderings: DashMap<Path, (ObjectMeta, Option<LexOrdering>)>,

Review Comment:
   I went down a rabbit hole of reworking the cache... there was a lot of weird 
stuff going on (made worse by this PR). Each value in the cache essentially 
comprised 3 parts:
   - Statistics
   - Ordering
   - "Extra" i.e. cache invalidation metadata
   I tried to unify it all into 1 value... but it's a decent amount of code 
churn. I'll split this PR up.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to