ctsk commented on PR #15418: URL: https://github.com/apache/datafusion/pull/15418#issuecomment-2761412282
> You mean coalesce_partitions_if_needed() call is redundant in datafusion? I don't think that's the case, but if it is so, why don't we remove that line? I wanted to keep the PR simple and *just* refactor. Here is a follow-up that removes it: https://github.com/apache/datafusion/pull/15476. Let's see if the tests pass :) EnforceDistribution adds it here: https://github.com/apache/datafusion/blob/fa452e6789194e6da7c3aaf5e2f1ffa5679aacf2/datafusion/physical-optimizer/src/enforce_distribution.rs#L940-L968 `add_spm_on_top` gets called when the child operator has the `SinglePartition` requirement https://github.com/apache/datafusion/blob/fa452e6789194e6da7c3aaf5e2f1ffa5679aacf2/datafusion/physical-optimizer/src/enforce_distribution.rs#L1256-L1259 This apples to CollectLeft HJs left child: https://github.com/apache/datafusion/blob/fa452e6789194e6da7c3aaf5e2f1ffa5679aacf2/datafusion/physical-plan/src/joins/hash_join.rs#L705-L710 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org