wirybeaver opened a new issue, #22946: URL: https://github.com/apache/datafusion/issues/22946
### Is your feature request related to a problem or challenge? `WindowAggExec` can require buffering an entire window partition before it can evaluate some window functions. Today that buffering is memory-only, so queries over large or skewed partitions can exhaust the configured memory pool even when the runtime has a spill-capable disk manager. This is surprising because other memory-intensive physical operators in DataFusion already participate in spill paths, but window aggregation does not. A query with a large `PARTITION BY` group, or a query without `PARTITION BY`, can therefore fail with `Resources exhausted` rather than using the configured spill storage. ### Describe the solution you'd like Add spill support to `WindowAggExec`: - Track buffered partition batches with a `MemoryReservation`. - Preserve the current sorted-input partition semantics and finish one partition at a time. - When the active partition cannot grow its reservation and a disk manager is available, write the buffered partition batches to spill files through the existing `SpillManager`. - Read spilled partitions back when the partition is ready for window expression evaluation. - Report spill metrics such as spill count, spilled rows, and spilled bytes. - Keep the non-spill path unchanged when memory is sufficient. ### Describe alternatives you've considered A more advanced alternative would be implementing streaming evaluation for more window frame/function combinations. That can reduce memory further for specific functions, but it is a larger semantic change and does not cover all window functions. Operator-level spill support is still useful as a general fallback for large partitions. ### Additional context This would make `WindowAggExec` behave more consistently with DataFusion's other spill-aware physical operators and make memory-limited window queries fail less often when spill storage is configured. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
