peter-toth commented on PR #54330: URL: https://github.com/apache/spark/pull/54330#issuecomment-4007304115
> > Also can we add more tests: > > > > * Empty grouped partitions: Plan that yields groupedPartitions.isEmpty (ie, partitioned table but no partition values inserted) > > @szehon-ho, I added an empty partitioned table test in [4a904ad](https://github.com/apache/spark/commit/4a904ad98a3870e99136cf88a797903feb6496b8), but it seems we prevent returing `KeyedPartitioning` without partitions. This is not new behaviour, it worked the same way with `KeyGroupedPartitioning` before this PR. If we removed that `inputPartitions.nonEmpty` guard from `BatchScanExec` and then the 2 shuffles would disappear, but no `GroupPartitionsExec` is added as those are not needed. Maybe the only way to get `GroupPartitionsExec` with empty `groupedPartitions` is to enable `spark.sql.sources.v2.bucketing.partition.filter.enabled` and use disjoint set of keys in join legs to get empty `expectedPartitionKeys`. Let me check this tomorrow. @szehon-ho, I added a test case that yields groupedPartitions.isEmpty in https://github.com/apache/spark/pull/54330/commits/32b563fcd4f93652c227ce6ec9c6d31ccf9aee0b. The commit also cleans up SPARK-55092 (scans should not group partitions) test case, but that's kind of trivial due to moving partition grouping out from scans. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
