hi Antoine, Thank you for starting this discussion.
I left some comments on the PR. I had been looking previously at TensorFlow's file system APIs ([1], and various implementations) for some possible guidance around this, though since Arrow is intended as development platform / reusable set of libraries our use cases are a bit more general purpose than TF. To Romain and R folks and Kou and the Ruby folks, it would be great to get your feedback on this as well since you can make use of this functionality in R, C GLib, and Ruby. - Wes [1] https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/platform/file_system.h On Mon, Apr 29, 2019 at 11:26 AM Antoine Pitrou <solip...@pitrou.net> wrote: > > > Hello, > > For the datasets project (*), one requirement is for Arrow to grow a > filesystem abstraction. The aim is to access various kinds of storage > systems (local filesystem, S3, HadoopFS...) with a single API. > Hopefully, the API can be made good enough to avoid inefficiencies. > > I've pushed a draft PR with a simple API proposal in: > https://github.com/apache/arrow/pull/4225 > > This PR is meant as a starting point for discussion. If you have any > insight or experience on the subject, please review and give > suggestions / comments. > > (*)https://docs.google.com/document/d/1DCPwA6gF-Uy-rlHoVL60j-I-b1L7n1aqKLie2L3U50k/edit > > Regards > > Antoine. > >