Re: [Py] writing 2- or 4-byte decimal columns to Parquet

2018-04-19 Thread Colin Nichols
quet/arrow/writer.cc#L798 > > >> > > >> The size of the type depends on the decimal precision, so if we can > > >> write to 32- or 64-bit, then we do that. Writing to INT32 or INT64 > > >> would be more complicated and require some work in parquet-cpp > &g

Re: [Py] writing 2- or 4-byte decimal columns to Parquet

2018-04-18 Thread Colin Nichols
Hi all, Any thoughts on the below? I did a little more code browsing and I'm not sure this is supported right now, should I open a Jira ticket? - Colin On Tue, Apr 17, 2018 at 11:11 PM, Colin Nichols wrote: > Hi there, > > I know (py)arrow has the decimal128() type, and using

[Py] writing 2- or 4-byte decimal columns to Parquet

2018-04-17 Thread Colin Nichols
Hi there, I know (py)arrow has the decimal128() type, and using this type it's easy to take an array of Python Decimals, convert to a pa.array, and write out to Parquet. In the absence (afaict) of decimal32 and decimal64 types, is it possible to go from an array of Decimals (with compatible preci

Arrow for Redshift Spectrum

2017-09-28 Thread Colin Nichols
Hi all, Would love to get some feedback on a little project I put together. I paired my company's parquet conversion routines (wrapper around pyarrow) with SqlAlchemy's table reflection capabilities to make an "easy mode" redshift --> Redshift spectrum converter. You can find it here: https://

Re: pyarrow versioning

2017-08-07 Thread Colin Nichols
Ah ok, makes sense -- thanks Wes! - Colin *Colin Nichols | Senior Software Engineer335 Madison Avenue, 16F | New York, NY, 10017+1 (646) 912 2018 | BAM.ai* On Mon, Aug 7, 2017 at 3:55 PM, Wes McKinney wrote: > hi Colin, > > Sorry about that. Yes, I pulled the 0.5.0 packages from Py

pyarrow versioning

2017-08-07 Thread Colin Nichols
Hi all, I noticed today that pyarrow==0.5.0 has disappeared from Pypi, replaced by 0.5.0.post2. Just wanted to make sure that was intended. If so, is the expectation that users put e.g., pyarrow~=0.5.0 in their requirements file as opposed to pyarrow==0.5.0? Thank you, Colin *Colin Nichols

Re: [VOTE] Release Apache Arrow 0.5.0 - RC2

2017-07-23 Thread Colin Nichols
+1 - Ran manylinux1 Python build+tests - installed resulting wheel on ubuntu 16.04 Had to update imports of TimestampType in my code since it was removed from the top level. I am using it for type checking of columns. -- sent from my phone -- > On Jul 23, 2017, at 10:33, Julien Le Dem wrote:

Re: Implementing (ARROW-1119) [Python] Enable reading Parquet data sets from Amazon S3

2017-06-22 Thread Colin Nichols
I am using a pa.PythonFile() wrapping the file-like object provided by s3fs package. I am able to write parquet files directly to S3 this way. I am not reading using pyarrow (reading gzipped csvs with python) but I imagine it would work much the same. -- sent from my phone -- > On Jun 22, 201

[jira] [Created] (ARROW-1120) Write support for int96

2017-06-15 Thread colin nichols (JIRA)
colin nichols created ARROW-1120: Summary: Write support for int96 Key: ARROW-1120 URL: https://issues.apache.org/jira/browse/ARROW-1120 Project: Apache Arrow Issue Type: Improvement