zhuqi-lucas commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2934552973
> > @pepijnve, an intrusive solution will be a hard sell for me. There are simply too many cases, each with their own context and somewhat specific details. > > @ozankabak could you make that a bit more concrete with an example? I want to make sure I understand the concern(s) correctly. > > I've been talking to a colleague about this issue in the meantime and we kind of came to the conclusion that it's preferable for each `Stream` implementation to be a well behaved tokio citizen. Streams that poll in a loop and do not yield when their polled child is always ready is something you should avoid in general I think (cfr. the tokio docs on cooperative scheduling). It's also something that's rather easy to spot locally and also to fix locally. > > At first glance the YieldExec operator looked like an attractive solution, but it's actually the more intrusive option because it changes the user visible plan tree. Stream implementations on the other hand are, afaik, an internal implementation detail. @pepijnve @ozankabak What about we don't add a YieldExec operator, but we find the operator which need to wrap a yield streaming, and just add that. May be add a new bool field for execute plan, it can infer if we need to add yield streaming for this operator. In this way, we will not break the operator view from customer side, and it also can be a unified solution. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org