zhuqi-lucas commented on issue #16353:
URL: https://github.com/apache/datafusion/issues/16353#issuecomment-2962336330

   > At the risk of making myself unpopular, I feel it's relevant to share my 
findings with you guys.
   > 
   > Working on [#16322](https://github.com/apache/datafusion/pull/16322) led 
me into the tokio implementation, in particular it led me to this line in the 
[Chan 
implementation](https://github.com/tokio-rs/tokio/blob/master/tokio/src/sync/mpsc/chan.rs#L295).
 This is the code that handles RecordBatch passing in RecordBatchReceiverStream.
   > 
   > I was immediately reminded of the cancellation discussions. Without 
realizing it DataFusion is actually already using Tokio's coop mechanism. This 
strengthens my belief that the PR that was merged is going about things the 
wrong way. It introduces API which overlaps 100% with something that already 
exists and is already being used. I don't think it's a good idea to have 
multiple mechanisms for the same thing. Pipeline-blocking operators exactly 
match the pattern described in [the Tokio cooperative scheduling 
documentation](https://docs.rs/tokio/latest/tokio/task/coop/index.html#cooperative-scheduling)
 so why would you not use the solution the runtime provides which you're 
already using in quite a few place already (everywhere 
RecordBatchReceiverStream is used)?
   
   Thank you @pepijnve , do you mean we can replace YieldStream with Tokio's 
coop? Or change the rule for adding Yield also? 
   
   I am still not looking into the Tokio's coop, maybe we can also add a 
sub-task for it, and list the benefit for it:
   
   Such as 
   1. The performance will be better after using Tokio's coop with benchmark 
result?
   2. Or we can handle more corner cases, and automatically handling 
user-defined exec? 
   3. Or we will have more clear and easy API?
   4. ETC
   
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to