andygrove opened a new pull request, #4195: URL: https://github.com/apache/datafusion-comet/pull/4195
## Which issue does this PR close? Closes #4191 (sub-issue of #4098). ## Rationale for this change Three Spark 4.1 tests in \`DataFrameSetOperationsSuite\` were ignored under Comet: - \`SPARK-52921: union partitioning - reused shuffle\` - \`SPARK-52921: union partitioning - semantic equality\` - \`SPARK-52921: union partitioning - range partitioning\` The tests inspect the executed plan with strict pattern matches: \`\`\`scala case u: UnionExec => u case s: ShuffleExchangeExec => s \`\`\` Under Comet, \`UnionExec\` is replaced by \`CometUnionExec\` (extends \`CometExec\`, not \`UnionExec\`) and \`ShuffleExchangeExec\` is replaced by \`CometShuffleExchangeExec\` (extends \`ShuffleExchangeLike\`, the trait both implementations share). The collectors found zero operators, the \`size == 1\` assertions failed, and the IgnoreComet was added pointing at the umbrella tracking issue #4098. ## What changes are included in this PR? Patch the matchers in \`dev/diffs/4.1.1.diff\` so the tests recognize Comet's wrappers: - \`case s: ShuffleExchangeExec\` → \`case s: ShuffleExchangeLike\` (one trait, matches both impls). - \`case u: UnionExec\` → also match \`case u: CometUnionExec\` (no shared parent, so two cases). Both are valid for vanilla Spark too: \`ShuffleExchangeExec\` extends \`ShuffleExchangeLike\`, and the additional \`CometUnionExec\` case is simply unreachable when Comet is disabled. ## How are these changes tested? The fix is test-side only (no production code change). The partitioning equality assertions still hold under Comet because: - \`CometShuffleExchangeExec.apply\` (in \`ShimCometShuffleExchangeExec\`) sets \`outputPartitioning = wrapped.outputPartitioning\`, preserving the original shuffle's partitioning. - \`CometUnionExec.outputPartitioning\` delegates to \`originalPlan.outputPartitioning\` (the wrapped \`UnionExec\`), which honors \`UNION_OUTPUT_PARTITIONING\` and computes against its original Spark children — so the SPARK-52921 semantics are preserved end-to-end. The Spark SQL CI workflow will exercise the un-ignored tests on Spark 4.1.1. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
