2010YOUY01 commented on PR #23184:
URL: https://github.com/apache/datafusion/pull/23184#issuecomment-4828821650
> Since `Distribution` was not designed for multiple children I like the
current `co_partitioned_with()` API as it can accept any `Distribution`s and
check them without a need to redesign the API.
Making this co-partition requirement part of the `ExecutionPlan` API seems
like a better long-term solution. The existing `co_partitioned_with` adds a
hidden assumption to `ExecutionPlan::required_input_distribution`: if both
sides are `KeyPartitioned`, then `left.co_partitioned_with(right)` will be
checked somewhere else. This creates a leaky abstraction somehow.
If we can instead make this an explicit requirement at the `ExecutionPlan`
level, the implementation is likely to become simpler and easier to understand.
```rust
// Before
fn required_input_distribution(&self) -> Vec<Distribution> {
vec![Distribution::UnspecifiedDistribution; self.children().len()]
}
```
```rust
// After
pub struct RequiredInputDistributions {
pub per_child: Vec<Distribution>,
pub cross_child: Vec<CrossChildDistribution>,
}
trait ExecutionPlan {
fn required_input_distributions(&self) -> RequiredInputDistributions {
...
}
}
```
The tradeoff is that this would require a large refactor upfront. The should
not be a hard requirement for now, and I'm also unsure how to carry this out
incrementally, but I would still love to see us move towards this direction
sooner.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]