LiaCastaneda opened a new issue, #17541:
URL: https://github.com/apache/datafusion/issues/17541

   ### Describe the bug
   
   I see duplicated OR clauses on the DynamicPhysicalExpr I get in the consumer
   
   for an execution plan like this:
   ```
   
   ProjectionExec: expr=[c0@0 as c0, c1@1 as c1, c2@2 as c2]
     CoalescePartitionsExec: fetch=5
       CoalesceBatchesExec: target_batch_size=8192, fetch=5
         HashJoinExec: mode=CollectLeft, join_type=Inner, on=[(c0@0, c32@32)]
           CoalesceBatchesExec: target_batch_size=8192
             FilterExec: c0@0 IS NOT NULL
               DataSourceExec: partitions=1, partition_sizes=[1]
            RepartitionExec: partitioning=RoundRobinBatch(16), 
input_partitions=1
              CooperativeExec
                DataSourceExec: partitions=1
   ```
   
   The bounds  predicates arrive as 16 identical conjuncts, 1 per (right) 
output partition it seems:
   
   ```
   (
     ("c32" >= 'db-01' AND "c32" <= 'keb-03')
     OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
     OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
     OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
     OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
     OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
     OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
     OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
     OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
     OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
     OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
     OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
     OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
     OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
     OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
     OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
   )
   ```
   
   
   This is probably related to 
[this](https://github.com/apache/datafusion/pull/17197#pullrequestreview-3135737978)
 comment. I wrote some logic in the consumer node to dedup the predicates but 
it seems worth handling in DataFusion.
   
   Following the code, in `CollectLeft` we derive the number of output 
predicates from the 
[right](https://github.com/apache/datafusion/blob/50733bcb801e21766dbed6e2403e87cfad7f8007/datafusion/physical-plan/src/joins/hash_join/shared_bounds.rs#L157)
 side’s partition count. But iiuc `CollectLeft` collects the left into a single 
partition, so every right-side partition will see the same bounds in theory? 
   
   
   ### To Reproduce
   
   _No response_
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to