alamb commented on issue #16200: URL: https://github.com/apache/datafusion/issues/16200#issuecomment-2913885348
> Yes very neat. I was actually thinking this would be along the other axis: loading metadata only for the _columns_ that are needed. My gut feeling is that a lot of compute is spent loading metadata for columns that aren't being filtered on. But I don't know if that's possible given the structure of the row group / page metadata. I think we could certainly avoid loading page metadata for columns We would probably have to add some sort of new API to [`ParquetMetadataLoader`](https://docs.rs/parquet/latest/parquet/file/metadata/struct.ParquetMetaDataReader.html) One challenge / tradeoff that would be interesting/required is that doing another async load to read more of the metdata will be very bad if that has to actually go to object store again. Influx has it all cached in memory so it doesn't matter, but in general we need to be careful of adding additional requests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org