alamb commented on code in PR #14028:
URL: https://github.com/apache/datafusion/pull/14028#discussion_r1906059983


##########
datafusion/core/src/datasource/physical_plan/file_stream.rs:
##########
@@ -478,7 +478,12 @@ impl<F: FileOpener> FileStream<F> {
                                                     reader,
                                                 )),
                                                 partition_values,
-                                            }
+                                            };
+                                            // Return control to the runtime 
when we're ready to open the next file
+                                            // to prevent uncancellable 
queries in scenarios with many large files.
+                                            // This functions similarly to a 
`tokio::task::yield_now()`.
+                                            cx.waker().wake_by_ref();
+                                            return Poll::Pending;

Review Comment:
   I wonder if it would be possible to call `yield_now` directly rather than 
trying to insert waker manipulation directly into the file opener
   
   Like could we do something like
   ```rust
   let future = tokio::task::yield_now()
   ```
   
   And return `future.poll_next()` 🤔 
   
   
   For example, what if we made a RecordBatchStream that wrapped another stream 
and called `yield_now` after some number of batches returned?
   
   



##########
datafusion/core/src/datasource/physical_plan/file_stream.rs:
##########
@@ -478,7 +478,12 @@ impl<F: FileOpener> FileStream<F> {
                                                     reader,
                                                 )),
                                                 partition_values,
-                                            }
+                                            };
+                                            // Return control to the runtime 
when we're ready to open the next file
+                                            // to prevent uncancellable 
queries in scenarios with many large files.
+                                            // This functions similarly to a 
`tokio::task::yield_now()`.
+                                            cx.waker().wake_by_ref();
+                                            return Poll::Pending;

Review Comment:
   I wonder if it would be possible to call `yield_now` directly rather than 
trying to insert waker manipulation directly into the file opener
   
   Like could we do something like
   ```rust
   let future = tokio::task::yield_now()
   ```
   
   And return `future.poll_next()` 🤔 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to