adrians commented on PR #50170:
URL: https://github.com/apache/spark/pull/50170#issuecomment-2740458065

   Yes, there are benefits beyond simple datasource pushdowns (that I've shown 
in the benchmarks).
   This optimizer-rule acts as a 
[strength-reduction](https://en.wikipedia.org/wiki/Strength_reduction) 
operation, and allows other rules to be chained to it.
   
   Although the end-result is a predicate-pushdown, the project that I'm 
working on needs the following section of transformations to happen in order 
(I've color-coded with red sections that are "soon-to-be-deleted" and with 
green the sections that are "soon-to-be-added"):
   
   ```mermaid
   flowchart TD
   A["Inserted query"]
   -->|"SELECT * FROM tbl WHERE array_contains(<br><span style="color: 
red;">from_json(:payload, :schema).col1</span>, system.bucket(16, 
col1))"|B["[Spark Core]<br>Constant folding"]
   -->|"SELECT * FROM tbl WHERE <span style="color: 
red;">array_contains</span>(<br><span style="color: green;">array(val1, val2, 
val3)</span>, system.bucket(16, col1))"|C["[This PR]<br>Replace ArrayContains 
with InSet nodes where applicable."]
   -->|"SELECT * FROM tbl WHERE <br><span style="color: red;">system.bucket(16, 
col1)</span> <span style="color: green;">IN</span> (val1, val2, 
val3)"|D["[Iceberg extension]<br>Replace StaticInvoke(system.bucket) node with 
ApplyFunctionExpression(system.bucket). Rather than evaluate the function for 
each row of the table, generate an expression that can be pushed-down.<br>This 
rule looks for the In/InSet nodes in the logical plan."]
   -->|"SELECT * FROM tbl WHERE <span style="color: green;">partitionCol</span> 
IN<br> (val1, val2, val3)"|E["Iceberg DataSourceV2 implementation"]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to