alamb commented on issue #16200:
URL: https://github.com/apache/datafusion/issues/16200#issuecomment-2930452684

   I feel like we may need to add some sort of policy as this same tradeoff is 
coming up when implementing filter_pushdown optimizations. Namely, is it 
important to minimize IO operations or are more IO operations ok if it reduces 
CPU/Memory requirements. 
   
   As @adriangb and @etseidl say, this tradeoff is quite different depending on 
local vs object store.
   
   Maybe we could make some sort of ObjectStore based interface that allows the 
parquet reader to hint what data might be necessary (e.g. the entire range of 
metadata / pages before pruning) and then allow the lower level system to 
decide if it wanted to prefetch, buffer or just pass through the request 🤔 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to