Re: [PR] [SPARK-51400] Replace ArrayContains nodes to InSet [spark]

via GitHub Wed, 19 Mar 2025 08:40:26 -0700


adrians commented on PR #50170:
URL: https://github.com/apache/spark/pull/50170#issuecomment-2737117875


   I prefer the optimizer-rule approach since its output-plan can be further 
improved by other optimization rules, if those rules are written to search for 
`InSet` nodes in the query-plan.
   
   For example, in the Iceberg extension there's the 
[ReplaceStaticInvoke](https://github.com/apache/iceberg/blob/apache-iceberg-1.8.1/spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceStaticInvoke.scala#L72-L76)
 rule that matches `InSet` nodes, and allows for the efficient execution of 
queries like `SELECT * FROM tbl WHERE system.truncate(6, col1) IN ('aaaaaa', 
'bbbbbb', 'cccccc')` (assuming that `truncate(6, col1)` is a partitioning 
column of the table).
   
   Implementing the change as a pattern-match inside 
`DataSourceStrategy.translateFilterWithMapping` would mean that only the 
DataSource can improve on the execution-plan, while higher-level optimizer 
rules are prevented from acting.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51400] Replace ArrayContains nodes to InSet [spark]

Reply via email to