alamb commented on issue #13983: URL: https://github.com/apache/datafusion/issues/13983#issuecomment-2613378310
> [@alamb](https://github.com/alamb) Excited to see further optmization about `late materialization`, it is really an important feature as I think ! I tried to use it in `HoraeDB` last year, and found the same problem mentioned in [#6921](https://github.com/apache/datafusion/pull/6921) and it is frustrated... > > I will profile again with setting `datafusion.execution.parquet.pushdown_filters = true;`, and see what optimizations we can do in `datafusion`. Thanks @Rachelint For this case I believe the core change needs to happen in the Parquet reader. The background as I understand it is described here - https://github.com/apache/arrow-rs/issues/5523 @XiangpengHao has a prototype in the following PR - https://github.com/apache/arrow-rs/pull/6921 A good next step would be to measure how much faster DataFusion is with that PR -- the previous measurements we had a few other optimizations mixed in. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org