spark + parquet + schema name and metadata

2015-09-21 Thread Borisa Zivkovic
Hi, I am trying to figure out how to write parquet metadata when persisting DataFrames to parquet using Spark (1.4.1) I could not find a way to change schema name (which seems to be hardcoded to root) and also how to add data to key/value metadata in parquet footer. org.apache.parquet.hadoop.met

Re: spark + parquet + schema name and metadata

2015-09-22 Thread Borisa Zivkovic
quet-avro, do support it, while some others > don't (e.g. parquet-hive). > > Cheng > > On 9/21/15 7:13 AM, Borisa Zivkovic wrote: > > Hi, > > > > I am trying to figure out how to write parquet metadata when > > persisting DataFrames to parquet using Spark

Re: spark + parquet + schema name and metadata

2015-09-23 Thread Borisa Zivkovic
es get merged here. The > problem is that, if a single key is associated with multiple values, > Parquet doesn't know how to reconcile this situation, and simply gives up > writing summary files. This can be particular annoying for appending. In > general, users should avoid storing

Re: spark + parquet + schema name and metadata

2015-09-24 Thread Borisa Zivkovic
Hi, your suggestion works nicely.. I was able to attach metadata to columns and read that metadata from spark and by using ParquetFileReader It would be nice if we had a way to manipulate parquet metadata directly from DataFrames though. regards On Wed, 23 Sep 2015 at 09:25 Borisa Zivkovic