I am using a pa.PythonFile() wrapping the file-like object provided by s3fs package. I am able to write parquet files directly to S3 this way. I am not reading using pyarrow (reading gzipped csvs with python) but I imagine it would work much the same.
-- sent from my phone -- > On Jun 22, 2017, at 00:54, Kevin Moore <ke...@quiltdata.io> wrote: > > Has anyone started looking into how to read data sets from S3? I started > looking into it and wondered if anyone has a design in mind. > > We could implement an S3FileSystem class in pyarrow/filesystem.py. The > filesystem components could probably be written against the AWS Python SDK. > > The HDFS file system and file classes, however, are implemented at least > partially in Cython & C++. Is there an advantage to doing that for S3 too? > > Thanks, > > Kevin > > ---- > Kevin Moore > CEO, Quilt Data, Inc. > ke...@quiltdata.io | LinkedIn <https://www.linkedin.com/in/kevinemoore/> > (415) 497-7895 > > > Data packages for fast, reproducible data science > quiltdata.com