alamb commented on issue #14481: URL: https://github.com/apache/datafusion/issues/14481#issuecomment-2685206026
> Would love to help on this issue. It can probably be generalized in some way but I'm open to any thoughts you have. Thank you @AdamGS that would be amazing I suggest the following: 1. Write up an example showing how to enable the parquet metadata cahce 2. Figure out how to implement a simple in memory cache (Maybe default to 5MB or something) Ideally 2 would use the existing APIs for doing this The APIs I was referring to are: - https://docs.rs/datafusion/latest/datafusion/execution/cache/cache_manager/struct.CacheManager.html You can see there are two APIs there for statistics and metadata but I don't know if they are still hooked up > We built something similar for Vortex based on [moka](https://docs.rs/moka/latest/moka/) I think moka is likely overkill and too large a dependnecy for datafusion itself, but being able to connect up a moka based cache would be super helpful. > and it also saves on roundtrips during infer_schema/infer_stats. Indeed -- and making the schema / stats more efficient in general would be really good -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org