adrians commented on PR #50170: URL: https://github.com/apache/spark/pull/50170#issuecomment-2737117875
I prefer the optimizer-rule approach since its output-plan can be further improved by other optimization rules, if those rules are written to search for `InSet` nodes in the query-plan. For example, in the Iceberg extension there's the [ReplaceStaticInvoke](https://github.com/apache/iceberg/blob/apache-iceberg-1.8.1/spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceStaticInvoke.scala#L72-L76) rule that matches `InSet` nodes, and allows for the efficient execution of queries like `SELECT * FROM tbl WHERE system.truncate(6, col1) IN ('aaaaaa', 'bbbbbb', 'cccccc')` (assuming that `truncate(6, col1)` is a partitioning column of the table). Implementing the change as a pattern-match inside `DataSourceStrategy.translateFilterWithMapping` would mean that only the DataSource can improve on the execution-plan, while higher-level optimizer rules are prevented from acting. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org