crepererum commented on code in PR #14028:
URL: https://github.com/apache/datafusion/pull/14028#discussion_r1906929864


##########
datafusion/core/src/datasource/physical_plan/file_stream.rs:
##########
@@ -478,7 +478,12 @@ impl<F: FileOpener> FileStream<F> {
                                                     reader,
                                                 )),
                                                 partition_values,
-                                            }
+                                            };
+                                            // Return control to the runtime 
when we're ready to open the next file
+                                            // to prevent uncancellable 
queries in scenarios with many large files.
+                                            // This functions similarly to a 
`tokio::task::yield_now()`.
+                                            cx.waker().wake_by_ref();
+                                            return Poll::Pending;

Review Comment:
   > > I think this may not be correct. The `wake_by_ref` should be called 
AFTER returning `Pending`. I think you may be lucky that this works, but it's 
undefined behavior. If I understand the `yield_now` impl. correctly, it tries 
to delay the "wake" call.
   > 
   > I was thinking similarly, but those have changed my mind: 
[rust-lang/rust@`1f81f90`/library/core/src/task/wake.rs#L423-L426](https://github.com/rust-lang/rust/blob/1f81f906893d166d05fb4839f169983f2b564cc7/library/core/src/task/wake.rs#L423-L426)
   > 
   > and this example: 
[rust-lang/rust@`1f81f90`/library/core/src/task/wake.rs#L703-L704](https://github.com/rust-lang/rust/blob/1f81f906893d166d05fb4839f169983f2b564cc7/library/core/src/task/wake.rs#L703-L704)
   
   good point, thanks
   
   > Do we have any example of this in the codebase (returning pending not 
because of an IO)?
   
   We do, see #5299 .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to