berkaysynnada commented on issue #8227:
URL: https://github.com/apache/datafusion/issues/8227#issuecomment-2721337649

   > * DataFusion will be able to populate stats from Parquet files by reading 
the Parquet metadata, right? Will it do so lazily or would it do it eagerly in 
one go?
   
   `infer_stats()` API of ParquetFormat looks well-implemented to me, but I 
haven't had the chance to practice it yet. I'm also not very familiar with how 
it integrates with the frontside. Any additional insights on this would be 
appreciated by me too:)
   
   > * Can I give DataFusion pre-computed statistics? In particular I have some 
of the stats in a secondary index. Could I pull some of those stats and feed 
them into DataFusion to avoid it having to fetch parquet metadata to e.g. 
decide it can skip processing a file altogether? If I did feed in partial 
stats, would DataFusion be able to fetch the remainder e.g. for other columns 
or for files I didn't provide stats for?
   
   I will be very surprised if such a trick is possible.
   
   `StatisticsV2` refactor mainly aimed to improve the internal handling of 
statistical information, rather than modifying its integration with tables, 
optimizers etc. Those aspects remain unchanged


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to