adrians commented on PR #50170: URL: https://github.com/apache/spark/pull/50170#issuecomment-2740458065
Yes, there are benefits beyond simple datasource pushdowns (that I've shown in the benchmarks). This optimizer-rule acts as a [strength-reduction](https://en.wikipedia.org/wiki/Strength_reduction) operation, and allows other rules to be chained to it. Although the end-result is a predicate-pushdown, the project that I'm working on needs the following section of transformations to happen in order (I've color-coded with red sections that are "soon-to-be-deleted" and with green the sections that are "soon-to-be-added"): ```mermaid flowchart TD A["Inserted query"] -->|"SELECT * FROM tbl WHERE array_contains(<br><span style="color: red;">from_json(:payload, :schema).col1</span>, system.bucket(16, col1))"|B["[Spark Core]<br>Constant folding"] -->|"SELECT * FROM tbl WHERE <span style="color: red;">array_contains</span>(<br><span style="color: green;">array(val1, val2, val3)</span>, system.bucket(16, col1))"|C["[This PR]<br>Replace ArrayContains with InSet nodes where applicable."] -->|"SELECT * FROM tbl WHERE <br><span style="color: red;">system.bucket(16, col1)</span> <span style="color: green;">IN</span> (val1, val2, val3)"|D["[Iceberg extension]<br>Replace StaticInvoke(system.bucket) node with ApplyFunctionExpression(system.bucket). Rather than evaluate the function for each row of the table, generate an expression that can be pushed-down.<br>This rule looks for the In/InSet nodes in the logical plan."] -->|"SELECT * FROM tbl WHERE <span style="color: green;">partitionCol</span> IN<br> (val1, val2, val3)"|E["Iceberg DataSourceV2 implementation"] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org