jizezhang commented on code in PR #19002:
URL: https://github.com/apache/datafusion/pull/19002#discussion_r2621826156


##########
datafusion/physical-plan/src/repartition/mod.rs:
##########
@@ -1531,6 +1542,43 @@ impl PerPartitionStream {
             }
         }
     }
+
+    fn poll_next_and_coalesce(

Review Comment:
   Are you referring to maybe this method call 
https://github.com/apache/datafusion/blob/fc8824011bf5d4baccbfe51b3888ed5573ef3bfb/datafusion/physical-plan/src/repartition/mod.rs#L542
 when pulling batches from input partitions? Do you mean that we could 
potentially combine it with coalescing? If yes, it was [discussed 
briefly](https://github.com/apache/datafusion/issues/18782#issuecomment-3563395564)
 on whether to coalesce batches in input partition stream or output partition 
stream. Current implementation coalesces in output stream, as it preserves 
existing behavior most. Since batches are sent over channels from input streams 
to output streams, I am not sure how we would combine. But I could have totally 
misunderstood you or it might actually be better to coalesce when pulling from 
input streams given the optimization. Please let me know what you think.
   
   In the sort-preserving case, a `BatchBuilder` is used 
https://github.com/apache/datafusion/blob/fc8824011bf5d4baccbfe51b3888ed5573ef3bfb/datafusion/physical-plan/src/sorts/merge.rs#L44
 which has methods such as `push_row` and `build_record_batch` 
https://github.com/apache/datafusion/blob/fc8824011bf5d4baccbfe51b3888ed5573ef3bfb/datafusion/physical-plan/src/sorts/builder.rs#L112-L125
 which internally calls `interleave` from arrow. Would this also be something 
to be replaced/improved with the optimization you mentioned, or that is 
different?
   
   Thanks!
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to