ozankabak commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2934960971

   @pepijnve this is a good summary of why I am against changing each operator 
individually:
   
   > IIRC yes -- and of few flavors. Sorting unconditionally suffers from this 
problem. Aggregation suffers from it when its input is unsorted. Windowing is 
prone too, but conditionally for some window frames. Joins will also 
conditionally suffer from this issue, if they collect one side fully. There are 
also other operators that behave this way, but in a data-dependent fashion 
(e.g. partial sorting). I am sure there are also others I can't think of right 
now.
   
   We know exactly when this sort of a yielding will be needed (thanks to the 
information exposed to the planner by the `ExecutionPlan` APIs). Therefore, if 
we are to solve this at the stream level, one thing we can contemplate is to 
change the stream object itself (which is used universally by all operators) to 
have two variants (one that encapsulates yielding logic and but introduces a 
small overhead, one that has no overhead but does not yield). This could be 
through a generic parameter, or an enum. Then, the planner can tweak the 
appropriate operators in the final plan through an `ExecutionPlan` API like 
`with_yielding_streams` to support yielding when necessary.
   
   This route would require some design iterations, and could have unintended 
consequences that I am failing to see right now. Solving the issue with 
`YieldStreamExec` in the meantime is still the best option I see for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to