[ https://issues.apache.org/jira/browse/ARROW-9938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Miles Granger reassigned ARROW-9938: ------------------------------------ Assignee: Miles Granger > [Python] Add filesystem capabilities to other IO formats (feather, csv, json, > ..) > --------------------------------------------------------------------------------- > > Key: ARROW-9938 > URL: https://issues.apache.org/jira/browse/ARROW-9938 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Reporter: Joris Van den Bossche > Assignee: Miles Granger > Priority: Major > Labels: filesystem, good-first-issue > > In the parquet IO functions, we support reading/writing files from non-local > filesystems directly (in addition to passing a buffer) by: > - passing a URI (eg {{pq.read_parquet("s3://bucket/data.parquet")}}) > - specifying the filesystem keyword (eg > {{pq.read_parquet("bucket/data.parquet", filesystem=S3FileSystem(...))}}) > On the other hand, for other file formats such as feather, we only support > local files or buffers. So for those, you need to do the more manual (I > _suppose_ this works?): > {code:python} > from pyarrow import fs, feather > s3 = fs.S3FileSystem() > with s3.open_input_file("bucket/data.arrow") as file: > table = feather.read_table(file) > {code} > So I think the question comes up: do we want to extend this filesystem > support to other file formats (feather, csv, json) and make this more uniform > across pyarrow, or do we prefer to keep the plain readers more low-level (and > people can use the datasets API for more convenience)? > cc [~apitrou] [~kszucs] -- This message was sent by Atlassian Jira (v8.20.10#820010)