Re: [I] Cache Parquet Metadata [datafusion]

2025-08-02 Thread via GitHub
alamb closed issue #15582: Cache Parquet Metadata URL: https://github.com/apache/datafusion/issues/15582 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [I] Cache Parquet Metadata [datafusion]

2025-07-28 Thread via GitHub
alamb commented on issue #15582: URL: https://github.com/apache/datafusion/issues/15582#issuecomment-3128707387 > |100M | 21.2953s | 1.6018s | 13.2943x faster | 13x faster 👏 https://github.com/user-attachments/assets/d1ed4083-e0ee-46fa-a23d-d90d9fa05d52"; /> I w

Re: [I] Cache Parquet Metadata [datafusion]

2025-07-28 Thread via GitHub
nuno-faria commented on issue #15582: URL: https://github.com/apache/datafusion/issues/15582#issuecomment-3128489298 @alamb is this still something that would benefit upstream Datafusion? I've implemented the caching of Parquet metadata after noticing that a large amount of time spent

Re: [I] Cache Parquet Metadata [datafusion]

2025-06-25 Thread via GitHub
JigaoLuo commented on issue #15582: URL: https://github.com/apache/datafusion/issues/15582#issuecomment-3005738276 As a DataFusion user, I’m wondering if the Parquet footer is cached during reads. [This blog mentioned footer caching](https://blog.xiangpeng.systems/posts/caching-datafusion/#

Re: [I] Cache Parquet Metadata [datafusion]

2025-06-25 Thread via GitHub
alamb commented on issue #15582: URL: https://github.com/apache/datafusion/issues/15582#issuecomment-3005802982 > As a DataFusion user, I’m wondering if the Parquet footer could be cached. [This blog mentioned footer caching](https://blog.xiangpeng.systems/posts/caching-datafusion/#parquet-

Re: [I] Cache Parquet Metadata [datafusion]

2025-05-07 Thread via GitHub
adriangb commented on issue #15582: URL: https://github.com/apache/datafusion/issues/15582#issuecomment-2858701825 I'll mention that we now avoid reading metadata entirely for a lot of queries using an approach along the lines of https://github.com/apache/datafusion/issues/15585 -- This

Re: [I] Cache Parquet Metadata [datafusion]

2025-04-06 Thread via GitHub
alamb commented on issue #15582: URL: https://github.com/apache/datafusion/issues/15582#issuecomment-2781382025 > I would be happy to share / upstream any work I do on this if there is interest. Thanks @matthewmturner -- what I think would be really valuable is if you could prov

Re: [I] Cache Parquet Metadata [datafusion]

2025-04-04 Thread via GitHub
matthewmturner commented on issue #15582: URL: https://github.com/apache/datafusion/issues/15582#issuecomment-2780145360 I am working on this for `dft` right now actually and I plan on integrating it into the observability feature that I have been working on (where different observability m

[I] Cache Parquet Metadata [datafusion]

2025-04-04 Thread via GitHub
alamb opened a new issue, #15582: URL: https://github.com/apache/datafusion/issues/15582 ### Is your feature request related to a problem or challenge? When looking at some Samply profiles of ClickBench queries on my laptop, it appears there are several times where processing stalls