Re: Merge metadata error when appending to parquet table

2015-08-09 Thread Cheng Lian
The conflicting metadata values warning is a known issue https://issues.apache.org/jira/browse/PARQUET-194 The option "parquet.enable.summary-metadata" is a Hadoop option rather than a Spark option, so you need to either add it to your Hadoop configuration file(s) or add it via `sparkContext.h

Re: Merge metadata error when appending to parquet table

2015-08-09 Thread Krzysztof Zarzycki
Besides finding to this problem, I think I can workaround at least the WARNING message by overwriting parquet variable: parquet.enable.summary-metadata That according to this PARQUET-107 ticket can be used to disable writing summary file which is

Merge metadata error when appending to parquet table

2015-08-09 Thread Krzysztof Zarzycki
Hi there, I have a problem with a spark streaming job running on Spark 1.4.1, that appends to parquet table. My job receives json strings and creates JsonRdd out of it. The jsons might come in different shape as most of the fields are optional. But they never have conflicting schemas. Next, for e