[ 
https://issues.apache.org/jira/browse/ARROW-9938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miles Granger reassigned ARROW-9938:
------------------------------------

    Assignee: Miles Granger

> [Python] Add filesystem capabilities to other IO formats (feather, csv, json, 
> ..)
> ---------------------------------------------------------------------------------
>
>                 Key: ARROW-9938
>                 URL: https://issues.apache.org/jira/browse/ARROW-9938
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Joris Van den Bossche
>            Assignee: Miles Granger
>            Priority: Major
>              Labels: filesystem, good-first-issue
>
> In the parquet IO functions, we support reading/writing files from non-local 
> filesystems directly (in addition to passing a buffer) by:
> - passing a URI (eg {{pq.read_parquet("s3://bucket/data.parquet")}})
> - specifying the filesystem keyword (eg 
> {{pq.read_parquet("bucket/data.parquet", filesystem=S3FileSystem(...))}}) 
> On the other hand, for other file formats such as feather, we only support 
> local files or buffers. So for those, you need to do the more manual (I 
> _suppose_ this works?):
> {code:python}
> from pyarrow import fs, feather
> s3 = fs.S3FileSystem()
> with s3.open_input_file("bucket/data.arrow") as file:
>     table = feather.read_table(file)
> {code}
> So I think the question comes up: do we want to extend this filesystem 
> support to other file formats (feather, csv, json) and make this more uniform 
> across pyarrow, or do we prefer to keep the plain readers more low-level (and 
> people can use the datasets API for more convenience)?
> cc [~apitrou] [~kszucs]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to