wirybeaver commented on PR #22947: URL: https://github.com/apache/datafusion/pull/22947#issuecomment-4713568295
Calling out an important point from the discussion in #22946: this PR is not only adding a spill path to the existing implementation. It also changes the `WindowAggExec` execution model. Current upstream behavior is: ```text buffer all input -> concat all input -> compute all partitions -> emit once ``` This PR changes it to: ```text buffer one active partition -> spill it if needed -> compute completed partition -> emit partition output ``` That distinction matters because the current memory pressure is worse than only "large window partition may OOM": today memory usage can scale with the full child input even when every partition is small. With this PR, memory usage is bounded by the active/completed partition workflow, with spill used when the active partition cannot grow its reservation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
