Hello,
For the datasets project (*), one requirement is for Arrow to grow a filesystem abstraction. The aim is to access various kinds of storage systems (local filesystem, S3, HadoopFS...) with a single API. Hopefully, the API can be made good enough to avoid inefficiencies. I've pushed a draft PR with a simple API proposal in: https://github.com/apache/arrow/pull/4225 This PR is meant as a starting point for discussion. If you have any insight or experience on the subject, please review and give suggestions / comments. (*)https://docs.google.com/document/d/1DCPwA6gF-Uy-rlHoVL60j-I-b1L7n1aqKLie2L3U50k/edit Regards Antoine.