zhuqi-lucas commented on issue #16353: URL: https://github.com/apache/datafusion/issues/16353#issuecomment-2962336330
> At the risk of making myself unpopular, I feel it's relevant to share my findings with you guys. > > Working on [#16322](https://github.com/apache/datafusion/pull/16322) led me into the tokio implementation, in particular it led me to this line in the [Chan implementation](https://github.com/tokio-rs/tokio/blob/master/tokio/src/sync/mpsc/chan.rs#L295). This is the code that handles RecordBatch passing in RecordBatchReceiverStream. > > I was immediately reminded of the cancellation discussions. Without realizing it DataFusion is actually already using Tokio's coop mechanism. This strengthens my belief that the PR that was merged is going about things the wrong way. It introduces API which overlaps 100% with something that already exists and is already being used. I don't think it's a good idea to have multiple mechanisms for the same thing. Pipeline-blocking operators exactly match the pattern described in [the Tokio cooperative scheduling documentation](https://docs.rs/tokio/latest/tokio/task/coop/index.html#cooperative-scheduling) so why would you not use the solution the runtime provides which you're already using in quite a few place already (everywhere RecordBatchReceiverStream is used)? Thank you @pepijnve , do you mean we can replace YieldStream with Tokio's coop? Or change the rule for adding Yield also? I am still not looking into the Tokio's coop, maybe we can also add a sub-task for it, and list the benefit for it: Such as 1. The performance will be better after using Tokio's coop with benchmark result? 2. Or we can handle more corner cases, and automatically handling user-defined exec? 3. Or we will have more clear and easy API? 4. ETC Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org