[PR] [SPARK-51738][SQL][FOLLOWUP] Fix HashJoin to accept structurally-equal types [spark]

via GitHub Wed, 09 Apr 2025 14:40:42 -0700


ueshin opened a new pull request, #50549:
URL: https://github.com/apache/spark/pull/50549


   ### What changes were proposed in this pull request?
   
   This is a follow-up of #50537.
   
   Fixes `HashJoin` to accept structurally-equal types.
   
   ### Why are the changes needed?
   
   #50537 relaxed the requirement for binary comparison, so should `HashJoin`; 
otherwise, it can fail with `IllegalArgumentException`.
   
   For example, in `SubquerySuite`:
   
   ```scala
   sql("""
         |SELECT foo IN (SELECT struct(c, d) FROM r)
         |FROM (SELECT struct(a, b) foo FROM l)
         |""".stripMargin).show()
   ```
   
   fails with:
   
   ```
   [info]   java.lang.IllegalArgumentException: requirement failed: Join keys 
from two sides should have same length and types
   [info]   at scala.Predef$.require(Predef.scala:337)
   [info]   at 
org.apache.spark.sql.execution.joins.HashJoin.org$apache$spark$sql$execution$joins$HashJoin$$x$6(HashJoin.scala:115)
   [info]   at 
org.apache.spark.sql.execution.joins.HashJoin.org$apache$spark$sql$execution$joins$HashJoin$$x$6$(HashJoin.scala:110)
   [info]   at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.org$apache$spark$sql$execution$joins$HashJoin$$x$6$lzycompute(BroadcastHashJoinExec.scala:40)
   [info]   at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.org$apache$spark$sql$execution$joins$HashJoin$$x$6(BroadcastHashJoinExec.scala:40)
   [info]   at 
org.apache.spark.sql.execution.joins.HashJoin.buildKeys(HashJoin.scala:110)
   [info]   at 
org.apache.spark.sql.execution.joins.HashJoin.buildKeys$(HashJoin.scala:110)
   [info]   at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.buildKeys$lzycompute(BroadcastHashJoinExec.scala:40)
   [info]   at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.buildKeys(BroadcastHashJoinExec.scala:40)
   [info]   at 
org.apache.spark.sql.execution.joins.HashJoin.buildBoundKeys(HashJoin.scala:130)
   [info]   at 
org.apache.spark.sql.execution.joins.HashJoin.buildBoundKeys$(HashJoin.scala:129)
   [info]   at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.buildBoundKeys$lzycompute(BroadcastHashJoinExec.scala:40)
   [info]   at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.buildBoundKeys(BroadcastHashJoinExec.scala:40)
   [info]   at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.requiredChildDistribution(BroadcastHashJoinExec.scala:63)
   ...
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, `HashJoin` will work.
   
   ### How was this patch tested?
   
   Added the related test.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-51738][SQL][FOLLOWUP] Fix HashJoin to accept structurally-equal types [spark]

Reply via email to