pepijnve commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2939607781
@alamb @zhuqi-lucas I've made an alternative sketch of a possible 'intrusive' approach at https://github.com/pepijnve/datafusion/commit/9c74748b7fc1033f29ce6053d898f269e5fb4b90 Would love to get your feedback on this. I've tried to take the solution made by @zhuqi-lucas and some inspiration from tokio's coop budget stuff. `YieldStream` is most convenient to use in the async function stream implementation. The count down `Future` obtained from `PollBudget::consume_budget` gives you more fine grained control and is more appropriate for the 'manual' stream implementations. The change to `FilterExecStream` is an example of this where you're only paying the counter cost when many consecutive batches are completely filtered. The nominal value of the poll budget is more a hint than a hard limit this way, but I think that's good enough. Not sure if it makes sense to make the value itself configurable at the `SessionConfig` level though. Perhaps a `cancel_safety` boolean would be sufficient. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org