LantaoJin opened a new issue, #83:
URL: https://github.com/apache/datafusion-java/issues/83
### Is your feature request related to a problem or challenge?
PR **#28** added `tempDirectory(String)` to `SessionContextBuilder` so
callers can route DataFusion's spill files to a chosen directory. That setter
is the only Java surface for DataFusion's `RuntimeEnvBuilder` disk-manager
knobs today. Three real gaps remain — all three reachable on the Rust side,
none reachable from Java:
- **No way to spread spill across multiple volumes.**
`tempDirectory(String)` accepts one path. Upstream's
`DiskManagerMode::Directories(Vec<PathBuf>)` accepts many; spreading I/O across
disks is a real production pattern when one disk has insufficient bandwidth or
size.
- **No way to disable spill entirely.** Upstream offers
`DiskManagerMode::Disabled` for memory-only execution (queries that need spill
fail with `ResourcesExhausted` rather than going to disk). Useful for pinning
latency-sensitive queries to memory or for environments without writable disk.
- **No way to cap the spill volume size.** Upstream's
`RuntimeEnvBuilder::with_max_temp_directory_size(u64)` exists; without exposing
it, a runaway sort or hash-aggregate can fill the spill disk — which on a
multi-tenant node is a co-tenant outage, not just a query failure.
None of the three is reachable via `setOption(...)`. `setOption` routes
through DataFusion's `ConfigOptions::set(key, value)`. The disk-manager
configuration lives on `RuntimeEnv` construction, not in that namespace.
### Describe the solution you'd like
Three new setters on `SessionContextBuilder`, sitting next to the existing
`tempDirectory(String)`:
```java
// Spread spill across multiple volumes:
SessionContext.builder()
.tempDirectories(List.of("/data1/df-spill", "/data2/df-spill"))
.maxTempDirectorySize(20L << 30) // 20 GiB cap (cumulative across all
dirs)
.build();
// Force memory-only execution; queries that would need spill fail fast:
SessionContext.builder()
.disableSpill()
.build();
// Single-dir + cap (the common case; the existing tempDirectory still
works):
SessionContext.builder()
.tempDirectory("/tmp/df-spill")
.maxTempDirectorySize(10L << 30)
.build();
```
### Describe alternatives you've considered
_No response_
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]