adriangb commented on issue #3463:
URL: https://github.com/apache/datafusion/issues/3463#issuecomment-3708296137
> DF enabling filter pushdown will not influence the IO pattern to disk, and
therefore this cannot be responsible for the regression in performance
Ah maybe this is where my misunderstanding lies. I thought that it had a
drastic impact on I/O performance. I.e. if I have a query like `select * from
wide_table where small_col = 1 and large_col = 'abc';` turning filter pushdown
on would effectively do something like:
```sql
with _filter_0 as (
select small_col = 1 as mask from wide_table
), _filter_1 as (
select large_col = 'abc' from wide_table where <filter using
_filter_0.mask>
)
select * from wide_table where <filter using _filter_1.mask>
```
Where each cte results in I/O, in the case of `_filter_0` reading the
`small_col` and in the case of `_filter_1` reading `large_col`. In this case
we'd make 3 I/O fetches instead of 1 if we evaluated without filter pushdown
(ignoring splitting of byte ranges, etc.).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]