hi Antoine,

Thank you for starting this discussion.

I left some comments on the PR. I had been looking previously at
TensorFlow's file system APIs ([1], and various implementations) for
some possible guidance around this, though since Arrow is intended as
development platform / reusable set of libraries our use cases are a
bit more general purpose than TF.

To Romain and R folks and Kou and the Ruby folks, it would be great to
get your feedback on this as well since you can make use of this
functionality in R, C GLib, and Ruby.

- Wes

[1] 
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/platform/file_system.h

On Mon, Apr 29, 2019 at 11:26 AM Antoine Pitrou <solip...@pitrou.net> wrote:
>
>
> Hello,
>
> For the datasets project (*), one requirement is for Arrow to grow a
> filesystem abstraction.  The aim is to access various kinds of storage
> systems (local filesystem, S3, HadoopFS...) with a single API.
> Hopefully, the API can be made good enough to avoid inefficiencies.
>
> I've pushed a draft PR with a simple API proposal in:
> https://github.com/apache/arrow/pull/4225
>
> This PR is meant as a starting point for discussion.  If you have any
> insight or experience on the subject, please review and give
> suggestions / comments.
>
> (*)https://docs.google.com/document/d/1DCPwA6gF-Uy-rlHoVL60j-I-b1L7n1aqKLie2L3U50k/edit
>
> Regards
>
> Antoine.
>
>

Reply via email to