peter-toth commented on PR #54330:
URL: https://github.com/apache/spark/pull/54330#issuecomment-4007304115

   > > Also can we add more tests:
   > > 
   > > * Empty grouped partitions: Plan that yields groupedPartitions.isEmpty 
(ie, partitioned table but no partition values inserted)
   > 
   > @szehon-ho, I added an empty partitioned table test in 
[4a904ad](https://github.com/apache/spark/commit/4a904ad98a3870e99136cf88a797903feb6496b8),
 but it seems we prevent returing `KeyedPartitioning` without partitions. This 
is not new behaviour, it worked the same way with `KeyGroupedPartitioning` 
before this PR. If we removed that `inputPartitions.nonEmpty` guard from 
`BatchScanExec` and then the 2 shuffles would disappear, but no 
`GroupPartitionsExec` is added as those are not needed. Maybe the only way to 
get `GroupPartitionsExec` with empty `groupedPartitions` is to enable 
`spark.sql.sources.v2.bucketing.partition.filter.enabled` and use disjoint set 
of keys in join legs to get empty `expectedPartitionKeys`. Let me check this 
tomorrow.
   
   @szehon-ho, I added a test case that yields groupedPartitions.isEmpty in 
https://github.com/apache/spark/pull/54330/commits/32b563fcd4f93652c227ce6ec9c6d31ccf9aee0b.
   The commit also cleans up SPARK-55092 (scans should not group partitions) 
test case, but that's kind of trivial due to moving partition grouping out from 
scans.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to