Hi,
I am trying to figure out how to write parquet metadata when persisting
DataFrames to parquet using Spark (1.4.1)
I could not find a way to change schema name (which seems to be hardcoded
to root) and also how to add data to key/value metadata in parquet footer.
org.apache.parquet.hadoop.met
quet-avro, do support it, while some others
> don't (e.g. parquet-hive).
>
> Cheng
>
> On 9/21/15 7:13 AM, Borisa Zivkovic wrote:
> > Hi,
> >
> > I am trying to figure out how to write parquet metadata when
> > persisting DataFrames to parquet using Spark
es get merged here. The
> problem is that, if a single key is associated with multiple values,
> Parquet doesn't know how to reconcile this situation, and simply gives up
> writing summary files. This can be particular annoying for appending. In
> general, users should avoid storing
Hi,
your suggestion works nicely.. I was able to attach metadata to columns and
read that metadata from spark and by using ParquetFileReader
It would be nice if we had a way to manipulate parquet metadata directly
from DataFrames though.
regards
On Wed, 23 Sep 2015 at 09:25 Borisa Zivkovic