Re: [I] Slowdown in ClickBench Q36-Q37 between DataFusion 43.0.0 and 44.0.0 [datafusion]

via GitHub Wed, 26 Feb 2025 06:32:10 -0800


alamb commented on issue #14481:
URL: https://github.com/apache/datafusion/issues/14481#issuecomment-2685206026


   > Would love to help on this issue. 
   
   
   It can probably be generalized in some way but I'm open to any thoughts you 
have.
   
   Thank you @AdamGS that would be amazing
   
   I suggest the following:
   1. Write up an example showing how to enable the parquet metadata cahce
   2. Figure out how to implement a simple in memory cache (Maybe default to 
5MB or something)
   
   Ideally 2 would use the existing APIs for doing this
   
   The APIs I was referring to are: 
   - 
https://docs.rs/datafusion/latest/datafusion/execution/cache/cache_manager/struct.CacheManager.html
   
   You can see there are two APIs there for statistics and metadata but I don't 
know if they are still hooked up 
   
   > We built something similar for Vortex based on 
[moka](https://docs.rs/moka/latest/moka/)
   
   I think moka is likely overkill and too large a dependnecy for datafusion 
itself, but being able to connect up a moka based cache would be super helpful. 
   
   >  and it also saves on roundtrips during infer_schema/infer_stats. 
   
   Indeed -- and making the schema / stats more efficient in general would be 
really good
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [I] Slowdown in ClickBench Q36-Q37 between DataFusion 43.0.0 and 44.0.0 [datafusion]

Reply via email to