LantaoJin opened a new pull request, #78:
URL: https://github.com/apache/datafusion-java/pull/78

   ## Which issue does this PR close?
   
   - Closes #74 .
   
   ## Rationale for this change
   
   DataFusion's `RuntimeEnv` accepts a `CacheManagerConfig` with three 
independent caches: the file-embedded metadata cache (parquet footers / page 
metadata), the list-files cache (object-store `LIST` results), and the 
file-statistics cache (per-file row counts and column stats used by the 
planner). The Rust API is 
`RuntimeEnvBuilder::with_cache_manager(CacheManagerConfig)`. The Java binding 
has no surface for any of it — every `SessionContext` ends up with the no-op 
upstream defaults today, so a parquet workload reading the same footer 
thousands of times across queries goes back to the object store every single 
time, and statistics-driven planners can't persist their stats across queries.
   
   This PR adds a typed `cacheManager(CacheManagerOptions)` setter on 
`SessionContextBuilder` that exposes the three caches independently:
   
   ```java
   SessionContext ctx = SessionContext.builder()
       .cacheManager(CacheManagerOptions.builder()
           .fileMetadataCache(64L << 20)                       // 64 MiB cap
           .listFilesCache(8L << 20, Duration.ofMinutes(5))    // 8 MiB cap, 
5min TTL
           .fileStatisticsCache(true)
           .build())
       .build();
   ```
   
   Each setter is independent; calling one doesn't touch the others. Builders 
that never call `cacheManager(...)` see no change — the wire-format 
`cache_manager` field is absent and the JNI layer skips 
`with_cache_manager(...)` entirely, leaving upstream's own `RuntimeEnvBuilder` 
defaults in place.
   
   ## What changes are included in this PR?
   
   - **Proto:** `proto/cache_manager_options.proto`.
   - **Java API:** `org.apache.datafusion.CacheManagerOptions`
   - **Native:** `native/src/cache_manager.rs`
   - **Build wiring:** `proto/cache_manager_options.proto`
   
   ## Are these changes tested?
   
   Yes, 18 new tests cross `CacheManagerOptionsTest` and 
`SessionContextCacheManagerTest`.
   
   ## Are there any user-facing changes?
   
   Yes, but additive only — no breaking changes:
   
   - New public class `org.apache.datafusion.CacheManagerOptions` with a static 
`builder()` and three setters.
   - New `SessionContextBuilder.cacheManager(CacheManagerOptions)` setter.
   
   No behavior change for callers that do not invoke the new setter — the 
`cache_manager` field is absent on the wire and the native side leaves 
upstream's `RuntimeEnvBuilder` defaults in place.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to