[PR] [SPARK-51400] Replace ArrayContains nodes to InSet [spark]

via GitHub Wed, 05 Mar 2025 05:32:31 -0800


adrians opened a new pull request, #50170:
URL: https://github.com/apache/spark/pull/50170


   ### What changes were proposed in this pull request?
   
   Add an optimization rule that replaces `ArrayContains` predicates with 
`InSet` ones.
   
   ### Why are the changes needed?
   
   Performance optimization.
   While both `ArrayContains` and `InSet` have similar functionality, `InSet` 
is better optimized.
   `ArrayContains` predicates are not pushed down to the datasources, while 
`InSet` ones are pushed down.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   SQL queries like `SELECT * FROM ... WHERE array_contains(.... , 
partitionColumn)` and checked the execution plan => with this optimization, the 
execution plan has a more agressive pushdown-filtering, while without the 
optimization, the expression became a Post-Scan predicate.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-51400] Replace ArrayContains nodes to InSet [spark]

Reply via email to