alamb commented on code in PR #14028: URL: https://github.com/apache/datafusion/pull/14028#discussion_r1906059983
########## datafusion/core/src/datasource/physical_plan/file_stream.rs: ########## @@ -478,7 +478,12 @@ impl<F: FileOpener> FileStream<F> { reader, )), partition_values, - } + }; + // Return control to the runtime when we're ready to open the next file + // to prevent uncancellable queries in scenarios with many large files. + // This functions similarly to a `tokio::task::yield_now()`. + cx.waker().wake_by_ref(); + return Poll::Pending; Review Comment: I wonder if it would be possible to call `yield_now` directly rather than trying to insert waker manipulation directly into the file opener Like could we do something like ```rust let future = tokio::task::yield_now() ``` And return `future.poll_next()` 🤔 For example, what if we made a RecordBatchStream that wrapped another stream and called `yield_now` after some number of batches returned? ########## datafusion/core/src/datasource/physical_plan/file_stream.rs: ########## @@ -478,7 +478,12 @@ impl<F: FileOpener> FileStream<F> { reader, )), partition_values, - } + }; + // Return control to the runtime when we're ready to open the next file + // to prevent uncancellable queries in scenarios with many large files. + // This functions similarly to a `tokio::task::yield_now()`. + cx.waker().wake_by_ref(); + return Poll::Pending; Review Comment: I wonder if it would be possible to call `yield_now` directly rather than trying to insert waker manipulation directly into the file opener Like could we do something like ```rust let future = tokio::task::yield_now() ``` And return `future.poll_next()` 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org