alamb commented on issue #16200: URL: https://github.com/apache/datafusion/issues/16200#issuecomment-2930452684
I feel like we may need to add some sort of policy as this same tradeoff is coming up when implementing filter_pushdown optimizations. Namely, is it important to minimize IO operations or are more IO operations ok if it reduces CPU/Memory requirements. As @adriangb and @etseidl say, this tradeoff is quite different depending on local vs object store. Maybe we could make some sort of ObjectStore based interface that allows the parquet reader to hint what data might be necessary (e.g. the entire range of metadata / pages before pruning) and then allow the lower level system to decide if it wanted to prefetch, buffer or just pass through the request 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org