pepijnve commented on code in PR #16319:
URL: https://github.com/apache/datafusion/pull/16319#discussion_r2136181096


##########
datafusion/physical-plan/src/sorts/sort.rs:
##########
@@ -1126,14 +1127,20 @@ impl ExecutionPlan for SortExec {
                 Ok(Box::pin(RecordBatchStreamAdapter::new(
                     self.schema(),
                     futures::stream::once(async move {
-                        while let Some(batch) = input.next().await {
-                            let batch = batch?;
-                            topk.insert_batch(batch)?;
-                            if topk.finished {
-                                break;
+                        // Spawn a task the first time the stream is polled 
for the sort phase.
+                        // This ensures the consumer of the sort does not poll 
unnecessarily
+                        // while the sort is ongoing

Review Comment:
   I intentionally tried to avoid that and I'm fairly sure it does, but I'll 
try to come up with a test that demonstrates it since it's a very important 
detail.
   
   These kinds of constructs have an Inceptiony quality to them, don't they. If 
I got my head wrapped around Futures and async correctly they're essentially 
interchangeable and inert until first polled.
   
   What should be happening here is that you get `stream::once` creates a 
stream consisting of a single element which is to be produced by a future. The 
future in question is only polled the first time the stream is polled.
   
   That future is an async block which in its first poll spawns the task. In 
other words, the spawn is deferred until first poll. Then we await the spawned 
task, which is just polling the JoinHandle.
   
   Anyway, I don't want you to take my word for it. I'll get to work on a test 
case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to