zhuqi-lucas commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2928856563

   > > Only those streams that call poll_next themselves in a loop, and as a 
consequence may block for an extended period of time, would need to do this. 
Are there that many of those?
   > 
   > IIRC yes -- and of few flavors. Sorting unconditionally suffers from this 
problem. Aggregation suffers from it when its input is unsorted. Windowing is 
prone too, but conditionally for some window frames. Joins will also 
conditionally suffer from this issue, if they collect one side fully. There are 
also other operators that behave this way, but in a data-dependent fashion 
(e.g. partial sorting). I am sure there are also others I can't think of right 
now.
   > 
   > Therefore I still firmly hold the position that we need to either solve 
the problem universally with some lower-level functionality (which is my 
preference, but may not be easy to do), or delegate to leaf nodes so that this 
becomes automatic as long as leaf nodes (whether real sources or synthetic) are 
implemented to account for it. I remain unconvinced that modifying all 
operators is the right thing to do.
   
   
   Got it, thank you @ozankabak.
   
   
   Updated, i create the POC of the unified solution, i think it works. 
   
   I add the physical rule to apply the leaf nodes automatically inserting the 
yield support. And i testing the reproduce cases, it works well.
   
   ```rust
   Before this PR:
   
   SET datafusion.execution.target_partitions = 1;
   
   SELECT SUM(value) FROM range(1, 50000000000);
   
   It will always stuck until done, we can't ctrl c to stop it.
   ```
   
   
   And besides the sql use cases, if we using exec directly, we can just add a 
simple optimize logic, it will automatically apply to all leaf nodes:
   
   ```rust
   // 3) optimize the plan with WrapLeaves to auto-insert Yield
       let optimized = WrapLeaves::new()
           .optimize(aggr.clone(), &ConfigOptions::new())?;
   ```
   
   If we agree this direction, i will add more testing and polish code. Thanks. 
cc @alamb @berkaysynnada @pepijnve 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to