Re: Implementing (ARROW-1119) [Python] Enable reading Parquet data sets from Amazon S3

Colin Nichols Thu, 22 Jun 2017 06:02:18 -0700

I am using a pa.PythonFile() wrapping the file-like object provided by s3fs 
package. I am able to write parquet files directly to S3 this way. I am not 
reading using pyarrow (reading gzipped csvs with python) but I imagine it would 
work much the same.


-- sent from my phone --

> On Jun 22, 2017, at 00:54, Kevin Moore <ke...@quiltdata.io> wrote:
> 
> Has anyone started looking into how to read data sets from S3? I started
> looking into it and wondered if anyone has a design in mind.
> 
> We could implement an S3FileSystem class in pyarrow/filesystem.py. The
> filesystem components could probably be written against the AWS Python SDK.
> 
> The HDFS file system and file classes, however, are implemented at least
> partially in Cython & C++. Is there an advantage to doing that for S3 too?
> 
> Thanks,
> 
> Kevin
> 
> ----
> Kevin Moore
> CEO, Quilt Data, Inc.
> ke...@quiltdata.io | LinkedIn <https://www.linkedin.com/in/kevinemoore/>
> (415) 497-7895
> 
> 
> Data packages for fast, reproducible data science
> quiltdata.com

Re: Implementing (ARROW-1119) [Python] Enable reading Parquet data sets from Amazon S3

Reply via email to