zhuqi-lucas commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2928856563
> > Only those streams that call poll_next themselves in a loop, and as a consequence may block for an extended period of time, would need to do this. Are there that many of those? > > IIRC yes -- and of few flavors. Sorting unconditionally suffers from this problem. Aggregation suffers from it when its input is unsorted. Windowing is prone too, but conditionally for some window frames. Joins will also conditionally suffer from this issue, if they collect one side fully. There are also other operators that behave this way, but in a data-dependent fashion (e.g. partial sorting). I am sure there are also others I can't think of right now. > > Therefore I still firmly hold the position that we need to either solve the problem universally with some lower-level functionality (which is my preference, but may not be easy to do), or delegate to leaf nodes so that this becomes automatic as long as leaf nodes (whether real sources or synthetic) are implemented to account for it. I remain unconvinced that modifying all operators is the right thing to do. Got it, thank you @ozankabak. Updated, i create the POC of the unified solution, i think it works. I add the physical rule to apply the leaf nodes automatically inserting the yield support. And i testing the reproduce cases, it works well. ```rust Before this PR: SET datafusion.execution.target_partitions = 1; SELECT SUM(value) FROM range(1, 50000000000); It will always stuck until done, we can't ctrl c to stop it. ``` And besides the sql use cases, if we using exec directly, we can just add a simple optimize logic, it will automatically apply to all leaf nodes: ```rust // 3) optimize the plan with WrapLeaves to auto-insert Yield let optimized = WrapLeaves::new() .optimize(aggr.clone(), &ConfigOptions::new())?; ``` If we agree this direction, i will add more testing and polish code. Thanks. cc @alamb @berkaysynnada @pepijnve -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org