Wes & Phillip, thank you both for doing some investigating. Really interesting -- afaict I should be taking advantage of the column width shrinking; but the error messages I'm seeing from Redshift Spectrum suggest otherwise: more info https://github.com/hellonarrativ/spectrify/issues/14
I'm probably doing something silly; hopefully I can help improve the docs at least :) Probably best to continue the conversation in the Spectrify issue until there's more info. Best, Colin On Thu, Apr 19, 2018 at 9:54 AM, Phillip Cloud <cpcl...@gmail.com> wrote: > That's right. Shrinking happens here: > https://github.com/apache/parquet-cpp/blob/master/src/ > parquet/arrow/writer.cc#L808-L809 > > On Thu, Apr 19, 2018 at 9:40 AM Wes McKinney <wesmck...@gmail.com> wrote: > > > We do "shrink" the input 128-bit decimals to the smallest number of > > bytes that fits, though, is that right? > > > > > > https://github.com/apache/parquet-cpp/blob/ > c405bf36506ec584e8009a6d53349277e600467d/src/parquet/arrow/schema.cc#L635 > > > > On Thu, Apr 19, 2018 at 8:09 AM, Phillip Cloud <cpcl...@gmail.com> > wrote: > > > Hi Colin, > > > > > > Only 128 bit decimal writing is supported right now. Feel free to open > a > > > JIRA about this. > > > > > > On Wed, Apr 18, 2018, 19:10 Wes McKinney <wesmck...@gmail.com> wrote: > > > > > >> hi Colin, > > >> > > >> Phillip Cloud is the expert on this topic, but I believe we only > > >> support writing decimals to FIXED_LEN_BYTE_ARRAY physical type in > > >> Parquet right now > > >> > > >> > > >> > > https://github.com/apache/parquet-cpp/blob/master/src/ > parquet/arrow/writer.cc#L798 > > >> > > >> The size of the type depends on the decimal precision, so if we can > > >> write to 32- or 64-bit, then we do that. Writing to INT32 or INT64 > > >> would be more complicated and require some work in parquet-cpp > > >> > > >> - Wes > > >> > > >> On Wed, Apr 18, 2018 at 7:04 PM, Colin Nichols <co...@narrativ.com> > > wrote: > > >> > Hi all, > > >> > > > >> > Any thoughts on the below? I did a little more code browsing and > I'm > > not > > >> > sure this is supported right now, should I open a Jira ticket? > > >> > > > >> > - Colin > > >> > > > >> > On Tue, Apr 17, 2018 at 11:11 PM, Colin Nichols <co...@narrativ.com > > > > >> wrote: > > >> > > > >> >> Hi there, > > >> >> > > >> >> I know (py)arrow has the decimal128() type, and using this type > it's > > >> easy > > >> >> to take an array of Python Decimals, convert to a pa.array, and > write > > >> out > > >> >> to Parquet. > > >> >> > > >> >> In the absence (afaict) of decimal32 and decimal64 types, is it > > possible > > >> >> to go from an array of Decimals (with compatible precision/scale) > and > > >> write > > >> >> them to a parquet column of 32- or 64- bit width? > > >> >> > > >> >> Relevant parquet spec -- https://github.com/apache/ > > >> >> parquet-format/blob/master/LogicalTypes.md#decimal > > >> >> > > >> >> I'm looking to add this functionality to the project Spectrify, as > > AWS > > >> >> Redshift Spectrum will not query unnecessarily-wide DECIMAL columns > > -- > > >> >> https://github.com/hellonarrativ/spectrify/issues/14 > > >> >> > > >> >> Thanks, > > >> >> Colin > > >> >> > > >> >> > > >> > > >