Hi Matei,

Another thing occurred to me. Will the binary format you're writing sort the 
data in numeric order? Or would the decimals have to be decoded for comparison?

Cheers,

Michael


> On Oct 12, 2014, at 10:48 PM, Matei Zaharia <matei.zaha...@gmail.com> wrote:
> 
> The fixed-length binary type can hold fewer bytes than an int64, though many 
> encodings of int64 can probably do the right thing. We can look into 
> supporting multiple ways to do this -- the spec does say that you should at 
> least be able to read int32s and int64s.
> 
> Matei
> 
> On Oct 12, 2014, at 8:20 PM, Michael Allman <mich...@videoamp.com> wrote:
> 
>> Hi Matei,
>> 
>> Thanks, I can see you've been hard at work on this! I examined your patch 
>> and do have a question. It appears you're limiting the precision of decimals 
>> written to parquet to those that will fit in a long, yet you're writing the 
>> values as a parquet binary type. Why not write them using the int64 parquet 
>> type instead?
>> 
>> Cheers,
>> 
>> Michael
>> 
>> On Oct 12, 2014, at 3:32 PM, Matei Zaharia <matei.zaha...@gmail.com> wrote:
>> 
>>> Hi Michael,
>>> 
>>> I've been working on this in my repo: 
>>> https://github.com/mateiz/spark/tree/decimal. I'll make some pull requests 
>>> with these features soon, but meanwhile you can try this branch. See 
>>> https://github.com/mateiz/spark/compare/decimal for the individual commits 
>>> that went into it. It has exactly the precision stuff you need, plus some 
>>> optimizations for working on decimals.
>>> 
>>> Matei
>>> 
>>> On Oct 12, 2014, at 1:51 PM, Michael Allman <mich...@videoamp.com> wrote:
>>> 
>>>> Hello,
>>>> 
>>>> I'm interested in reading/writing parquet SchemaRDDs that support the 
>>>> Parquet Decimal converted type. The first thing I did was update the Spark 
>>>> parquet dependency to version 1.5.0, as this version introduced support 
>>>> for decimals in parquet. However, conversion between the catalyst decimal 
>>>> type and the parquet decimal type is complicated by the fact that the 
>>>> catalyst type does not specify a decimal precision and scale but the 
>>>> parquet type requires them.
>>>> 
>>>> I'm wondering if perhaps we could add an optional precision and scale to 
>>>> the catalyst decimal type? The catalyst decimal type would have 
>>>> unspecified precision and scale by default for backwards compatibility, 
>>>> but users who want to serialize a SchemaRDD with decimal(s) to parquet 
>>>> would have to narrow their decimal type(s) by specifying a precision and 
>>>> scale.
>>>> 
>>>> Thoughts?
>>>> 
>>>> Michael
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>> 
>>> 
>> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to