asayers opened a new issue, #16676: URL: https://github.com/apache/datafusion/issues/16676
### Describe the bug parquet metadata load time was 15 _seconds_ with default configuration. Setting `target_partitions` to 1 brought it down to 15 _milliseconds_. With the default config I was seeing 20 file groups in the `DataSourceExec` (since my machine has 20 cores) and loading the metadata was taking forever. Forcing `target_partitions` to 1 fixed it: I now see just 1 file group in the `DataSourceExec`, metadata load time is down, and my plan no longer requires a `MergeExec`. ### To Reproduce The query I was running is very basic: just filter + sort + select. The parquet file is 130.00 MiB with 1.08 MiB of metadata (6.42 MiB when expanded in memory). ### Expected behavior I wouldn't have thought going from 1 file group to 20 would slow down metadata parsing by 1000x. ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org