alihandroid opened a new pull request, #11652:
URL: https://github.com/apache/datafusion/pull/11652
## Which issue does this PR close?
Closes #9792.
## Rationale for this change
Physical plans can be optimized further by pushing `GlobalLimitExec` and
`LocalLimitExec` down through certain nodes, or using versions of their
children nodes with fetch limits, without changing the result. This reduces
unnecessary data transfer and processing for a more efficient plan execution.
`CoalesceBatchesExec` can also benefit from this improvement, and as such, a
fetch limit functionality is implemented for it.
For example,
```
GlobalLimitExec: skip=0, fetch=5
StreamingTableExec: partition_sizes=1, projection=[c1, c2, c3],
infinite_source=true
```
can be turned into
```
StreamingTableExec: partition_sizes=1, projection=[c1, c2, c3],
infinite_source=true, fetch=5
```
and
```
GlobalLimitExec: skip=0, fetch=5
CoalescePartitionsExec
FilterExec: c3@2 > 0
RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=1
StreamingTableExec: partition_sizes=1, projection=[c1, c2, c3],
infinite_source=true
```
can be turned into
```
GlobalLimitExec: skip=0, fetch=5
CoalescePartitionsExec
LocalLimitExec: fetch=5
FilterExec: c3@2 > 0
RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=1
StreamingTableExec: partition_sizes=1, projection=[c1, c2, c3],
infinite_source=true
```
without changing the result, but using fewer resources and finishing faster
The physical plan in the following excerpt
https://github.com/apache/datafusion/blob/ecf5323eaa38869ed2f911b02f98e17aa6db639a/datafusion/sqllogictest/test_files/repartition.slt#L116-L129
will turn into
```
01)GlobalLimitExec: skip=0, fetch=5
02)--CoalescePartitionsExec
03)----CoalesceBatchesExec: target_batch_size=8192, fetch=5
04)------FilterExec: c3@2 > 0
05)--------RepartitionExec: partitioning=RoundRobinBatch(3),
input_partitions=1
06)----------StreamingTableExec: partition_sizes=1, projection=[c1, c2, c3],
infinite_source=true
```
Other examples can be found in the tests provided in `limit_pushdown.rs` and
other .slt tests
## What changes are included in this PR?
Implement `LimitPushdown` Rule:
- Introduced new APIs in the `ExecutionPlan` trait:
- `with_fetch(&self, fetch: Option<usize>) -> Option<Arc<dyn
ExecutionPlan>>`: Returns fetching version if supported, None otherwise. The
default implementation returns None
- `supports_limit_pushdown(&self) -> bool`: Returns true if a node
supports limit pushdown. The default implemenation returns false
Add fetch support to `CoalesceBatchesExec`:
- Add `fetch` field and `with_fetch` implementation
- Implement fetch limit functionality
## Are these changes tested?
Unit tests are provided for `LimitPushdown` and the new fetching support for
`CoalesceBatchesExec`
## Are there any user-facing changes?
No. The changes only affect performance
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]