2010YOUY01 commented on issue #22946:
URL: https://github.com/apache/datafusion/issues/22946#issuecomment-4713778114

   > I am also open to exploring whether the spill/streaming work should be 
integrated with `BoundedWindowAggExec`, especially for bounded frames as 
mentioned above. My hesitation is that `BoundedWindowAggExec` already has a 
specialized in-memory state/pruning model, so disk-backed state there likely 
deserves a separate focused design rather than being mixed into the initial 
spill PR.
   
   Here are some quick ideas. I may not have explained everything clearly yet, 
but I’ll put together an epic issue for improving window functions to better 
explain the direction.
   
   I think `BoundedWindowAggExec` should eventually be deprecated in favor of a 
new streaming implementation. My concern is that it assumes the input may not 
be fully ordered by group key and partition key, and that assumption gets in 
the way of a more efficient implementation.
   
   So my preference would be to move directly toward a better streaming 
implementation, rather than adding an intermediate spilling-based step.
   
   The workloads that a streaming approach cannot fully solve are:
   
   - a single partition that does not fit in memory
   - and, a window frame that moves unpredictably from row to row
   
   Those cases likely need an LRU-like algorithm, but I don’t think that should 
be the current priority. Since the window operator is still fairly basic at the 
moment, I think we should make the in-memory cases better first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to