peter-toth commented on code in PR #54330:
URL: https://github.com/apache/spark/pull/54330#discussion_r2847517879


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala:
##########
@@ -416,63 +480,43 @@ case class KeyGroupedPartitioning(
     val result = KeyGroupedShuffleSpec(this, distribution)
     if (SQLConf.get.v2BucketingAllowJoinKeysSubsetOfPartitionKeys) {
       // If allowing join keys to be subset of clustering keys, we should 
create a new
-      // `KeyGroupedPartitioning` here that is grouped on the join keys 
instead, and use that as
+      // `KeyedPartitioning` here that is grouped on the join keys instead, 
and use that as
       // the returned shuffle spec.
       val joinKeyPositions = 
result.keyPositions.map(_.nonEmpty).zipWithIndex.filter(_._1).map(_._2)
-      val projectedPartitioning = KeyGroupedPartitioning(expressions, 
joinKeyPositions,
-          partitionValues, originalPartitionValues)
+      val (projectedExpressions, projectedDataTypes, projectedKeys, 
projectedOriginalKeys) =
+        projectKeys(joinKeyPositions)
+      val projectedComparableWrapperFactory =
+        
InternalRowComparableWrapper.getInternalRowComparableWrapperFactory(projectedDataTypes)
+      val distinctPartitionKeys = 
projectedKeys.distinctBy(projectedComparableWrapperFactory)

Review Comment:
   You are right, ordering might change, but this use of `KeyedPartitioning` 
(and `KeyGroupedPartitioning` too before this PR) in a `KeyGroupedShuffleSpec`  
is special. It doesn't matter if the partition keys are sorted because later in 
`EnsureRequirements`:
   - they either fully match (`leftSpec.isCompatibleWith(rightSpec)`) without 
adding `GroupPartitionsExec` (or changing anything in `spjParam` of 
`BatchScanExec` before this PR);
   - or they don't match, but in that case we build a common, sorted sequence 
of keys in `mergePartitions()` and push that into `GroupPartitionsExec` (or 
into `spjParam` of `BatchScanExec` before this PR).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to