berkaysynnada commented on issue #8227: URL: https://github.com/apache/datafusion/issues/8227#issuecomment-2721337649
> * DataFusion will be able to populate stats from Parquet files by reading the Parquet metadata, right? Will it do so lazily or would it do it eagerly in one go? `infer_stats()` API of ParquetFormat looks well-implemented to me, but I haven't had the chance to practice it yet. I'm also not very familiar with how it integrates with the frontside. Any additional insights on this would be appreciated by me too:) > * Can I give DataFusion pre-computed statistics? In particular I have some of the stats in a secondary index. Could I pull some of those stats and feed them into DataFusion to avoid it having to fetch parquet metadata to e.g. decide it can skip processing a file altogether? If I did feed in partial stats, would DataFusion be able to fetch the remainder e.g. for other columns or for files I didn't provide stats for? I will be very surprised if such a trick is possible. `StatisticsV2` refactor mainly aimed to improve the internal handling of statistical information, rather than modifying its integration with tables, optimizers etc. Those aspects remain unchanged -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org