LiaCastaneda opened a new issue, #17541:
URL: https://github.com/apache/datafusion/issues/17541
### Describe the bug
I see duplicated OR clauses on the DynamicPhysicalExpr I get in the consumer
for an execution plan like this:
```
ProjectionExec: expr=[c0@0 as c0, c1@1 as c1, c2@2 as c2]
CoalescePartitionsExec: fetch=5
CoalesceBatchesExec: target_batch_size=8192, fetch=5
HashJoinExec: mode=CollectLeft, join_type=Inner, on=[(c0@0, c32@32)]
CoalesceBatchesExec: target_batch_size=8192
FilterExec: c0@0 IS NOT NULL
DataSourceExec: partitions=1, partition_sizes=[1]
RepartitionExec: partitioning=RoundRobinBatch(16),
input_partitions=1
CooperativeExec
DataSourceExec: partitions=1
```
The bounds predicates arrive as 16 identical conjuncts, 1 per (right)
output partition it seems:
```
(
("c32" >= 'db-01' AND "c32" <= 'keb-03')
OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
)
```
This is probably related to
[this](https://github.com/apache/datafusion/pull/17197#pullrequestreview-3135737978)
comment. I wrote some logic in the consumer node to dedup the predicates but
it seems worth handling in DataFusion.
Following the code, in `CollectLeft` we derive the number of output
predicates from the
[right](https://github.com/apache/datafusion/blob/50733bcb801e21766dbed6e2403e87cfad7f8007/datafusion/physical-plan/src/joins/hash_join/shared_bounds.rs#L157)
side’s partition count. But iiuc `CollectLeft` collects the left into a single
partition, so every right-side partition will see the same bounds in theory?
### To Reproduce
_No response_
### Expected behavior
_No response_
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]