alamb commented on code in PR #19002:
URL: https://github.com/apache/datafusion/pull/19002#discussion_r2591927995
##########
datafusion/physical-plan/src/repartition/mod.rs:
##########
@@ -1540,7 +1550,52 @@ impl Stream for PerPartitionStream {
mut self: Pin<&mut Self>,
cx: &mut Context<'_>,
) -> Poll<Option<Self::Item>> {
- let poll = self.poll_next_inner(cx);
+ let poll = match self.batch_coalescer.take() {
+ Some(mut coalescer) => {
+ let cloned_time =
self.baseline_metrics.elapsed_compute().clone();
+ let mut completed = false;
+ let poll;
+ loop {
+ if let Some(batch) = coalescer.next_completed_batch() {
+ poll = Poll::Ready(Some(Ok(batch)));
+ break;
+ }
+ if completed {
+ poll = Poll::Ready(None);
+ break;
+ }
+ let inner_poll = self.poll_next_inner(cx);
+ let _timer = cloned_time.timer();
+
+ match inner_poll {
Review Comment:
I found this logic slightly hard to follow and it is now in the main control
flow loop and is deeply indented
One thing that could help here is to factor the limiting into its own
function. So something like this
```rust
mut poll;
if let Some(coalescer) = self.batch_coalescer.take() {
poll = self.poll_next_and_colaesce(coalescer);
self.batch_coalescer = Some(coalescer);
} else {
poll = self.poll_next_inner()
}
self.baseline_metrics.record_poll(poll)
```
And then the logic for handling the coalsecer could be put into
`poll_next_and_colaesce`
##########
datafusion/physical-plan/src/repartition/mod.rs:
##########
@@ -1427,9 +1430,12 @@ struct PerPartitionStream {
/// Execution metrics
baseline_metrics: BaselineMetrics,
+
+ batch_coalescer: Option<LimitedBatchCoalescer>,
Review Comment:
I recommend adding some context here about when this could be null --
something like
```suggestion
/// None for sort preserving variant (merge sort already does coalescing)
batch_coalescer: Option<LimitedBatchCoalescer>,
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]