wirybeaver commented on PR #22947:
URL: https://github.com/apache/datafusion/pull/22947#issuecomment-4713675339

   One tradeoff worth making explicit: the previous whole-input model may still 
be the fastest path for small/medium inputs when memory is sufficient, because 
it concatenates once, evaluates over larger contiguous batches, and emits fewer 
output batches.
   
   This PR optimizes for a different failure mode: large/skewed inputs and 
memory-limited execution. If we want to preserve the existing fully in-memory 
fast path, one possible design is to keep the current `WindowAggExec` behavior 
and introduce a separate `SpillingWindowAggExec`, selected by the 
planner/config when spill support is desired.
   
   I am also open to exploring whether the spill/streaming work should be 
integrated with `BoundedWindowAggExec`, especially for bounded frames as 
mentioned in #22946. My hesitation is that `BoundedWindowAggExec` already has a 
specialized in-memory state/pruning model, so disk-backed state there likely 
deserves a separate focused design rather than being mixed into this PR.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to