Re: [I] [Proposal] Runtime Filters for DataFusion Comet [datafusion-comet]

via GitHub Wed, 07 Jan 2026 20:38:20 -0800


Shekharrajak commented on issue #3053:
URL: 
https://github.com/apache/datafusion-comet/issues/3053#issuecomment-3721844183


   ```
   Spark Plan (after InjectRuntimeFilter):
     Filter(BloomFilterMightContain(...))   =>  Comet can execute this 
       CometScanExec(orders)                => Reads ALL rows 
         └─ Native: Reads entire Parquet file
            └─ Then Filter operator filters rows (no I/O benefit)
   ```
   
   We would like to support this : 
   
   ```
   Spark Plan (after InjectRuntimeFilter):
     Filter(BloomFilterMightContain(...))  => Created by Spark
       CometScanExec(orders, bloomFilters) => Extract bloom filters 
         └─ Native: Bloom filters pushed into ParquetSource 
            └─ Filters applied DURING scan → I/O reduction 
   ```
   Please let me know if I am missing anything or you want me to explore in 
different direction. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] [Proposal] Runtime Filters for DataFusion Comet [datafusion-comet]

Reply via email to