Hi Matei, Thanks, I can see you've been hard at work on this! I examined your patch and do have a question. It appears you're limiting the precision of decimals written to parquet to those that will fit in a long, yet you're writing the values as a parquet binary type. Why not write them using the int64 parquet type instead?
Cheers, Michael On Oct 12, 2014, at 3:32 PM, Matei Zaharia <matei.zaha...@gmail.com> wrote: > Hi Michael, > > I've been working on this in my repo: > https://github.com/mateiz/spark/tree/decimal. I'll make some pull requests > with these features soon, but meanwhile you can try this branch. See > https://github.com/mateiz/spark/compare/decimal for the individual commits > that went into it. It has exactly the precision stuff you need, plus some > optimizations for working on decimals. > > Matei > > On Oct 12, 2014, at 1:51 PM, Michael Allman <mich...@videoamp.com> wrote: > >> Hello, >> >> I'm interested in reading/writing parquet SchemaRDDs that support the >> Parquet Decimal converted type. The first thing I did was update the Spark >> parquet dependency to version 1.5.0, as this version introduced support for >> decimals in parquet. However, conversion between the catalyst decimal type >> and the parquet decimal type is complicated by the fact that the catalyst >> type does not specify a decimal precision and scale but the parquet type >> requires them. >> >> I'm wondering if perhaps we could add an optional precision and scale to the >> catalyst decimal type? The catalyst decimal type would have unspecified >> precision and scale by default for backwards compatibility, but users who >> want to serialize a SchemaRDD with decimal(s) to parquet would have to >> narrow their decimal type(s) by specifying a precision and scale. >> >> Thoughts? >> >> Michael >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> For additional commands, e-mail: dev-h...@spark.apache.org >> > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org