Re: [I] Enable parquet filter pushdown (`filter_pushdown`) by default [datafusion]

via GitHub Sun, 04 Jan 2026 10:02:14 -0800


adriangb commented on issue #3463:
URL: https://github.com/apache/datafusion/issues/3463#issuecomment-3708296137


   > DF enabling filter pushdown will not influence the IO pattern to disk, and 
therefore this cannot be responsible for the regression in performance
   
   Ah maybe this is where my misunderstanding lies. I thought that it had a 
drastic impact on I/O performance. I.e. if I have a query like `select * from 
wide_table where small_col = 1 and large_col = 'abc';` turning filter pushdown 
on would effectively do something like:
   
   ```sql
   with _filter_0 as (
     select small_col = 1 as mask from wide_table
   ), _filter_1 as (
     select large_col = 'abc' from wide_table where <filter using 
_filter_0.mask>
   )
   select * from wide_table where <filter using _filter_1.mask>
   ```
   
   Where each cte results in I/O, in the case of `_filter_0` reading the 
`small_col` and in the case of `_filter_1` reading `large_col`. In this case 
we'd make 3 I/O fetches instead of 1 if we evaluated without filter pushdown 
(ignoring splitting of byte ranges, etc.).
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Enable parquet filter pushdown (`filter_pushdown`) by default [datafusion]

Reply via email to