Hi Matei,

Thanks, I can see you've been hard at work on this! I examined your patch and 
do have a question. It appears you're limiting the precision of decimals 
written to parquet to those that will fit in a long, yet you're writing the 
values as a parquet binary type. Why not write them using the int64 parquet 
type instead?

Cheers,

Michael

On Oct 12, 2014, at 3:32 PM, Matei Zaharia <matei.zaha...@gmail.com> wrote:

> Hi Michael,
> 
> I've been working on this in my repo: 
> https://github.com/mateiz/spark/tree/decimal. I'll make some pull requests 
> with these features soon, but meanwhile you can try this branch. See 
> https://github.com/mateiz/spark/compare/decimal for the individual commits 
> that went into it. It has exactly the precision stuff you need, plus some 
> optimizations for working on decimals.
> 
> Matei
> 
> On Oct 12, 2014, at 1:51 PM, Michael Allman <mich...@videoamp.com> wrote:
> 
>> Hello,
>> 
>> I'm interested in reading/writing parquet SchemaRDDs that support the 
>> Parquet Decimal converted type. The first thing I did was update the Spark 
>> parquet dependency to version 1.5.0, as this version introduced support for 
>> decimals in parquet. However, conversion between the catalyst decimal type 
>> and the parquet decimal type is complicated by the fact that the catalyst 
>> type does not specify a decimal precision and scale but the parquet type 
>> requires them.
>> 
>> I'm wondering if perhaps we could add an optional precision and scale to the 
>> catalyst decimal type? The catalyst decimal type would have unspecified 
>> precision and scale by default for backwards compatibility, but users who 
>> want to serialize a SchemaRDD with decimal(s) to parquet would have to 
>> narrow their decimal type(s) by specifying a precision and scale.
>> 
>> Thoughts?
>> 
>> Michael
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to