peter-toth commented on code in PR #54330:
URL: https://github.com/apache/spark/pull/54330#discussion_r2880203461
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala:
##########
@@ -549,23 +521,19 @@ case class EnsureRequirements(
// whether partially clustered distribution can be applied. For
instance, the
// optimization cannot be applied to a left outer join, where the
left hand
// side is chosen as the side to replicate partitions according to
stats.
- // Similarly, the partially clustered distribution cannot be applied
if the
- // partially clustered side must use the scan's key-grouped
partitioning to
- // satisfy some unrelated required distribution in its plan (for
example, for an aggregate
- // or window function), as this will give incorrect results (for
example, duplicate
- // row_number() values).
// Otherwise, query result could be incorrect.
- val canReplicateLeft = canReplicateLeftSide(joinType) &&
- canApplyPartialClusteredDistribution(right)
- val canReplicateRight = canReplicateRightSide(joinType) &&
- canApplyPartialClusteredDistribution(left)
+ val canReplicateLeft = canReplicateLeftSide(joinType)
+ val canReplicateRight = canReplicateRightSide(joinType)
Review Comment:
There is a partial clustering example in the PR description somewhat
illustrates this problem. Before this PR outer join node is not allowed to
request partially clustered partitions from the leaf scan's because there is an
inner join node.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]