Hi there!

Is there a way to modify default parquet block size?

I didn't see any reference to ParquetOutputFormat.setBlockSize in Spark code
so I was wondering if there was a way to provide this option?

I'm asking because we are facing Out of Memory issues when writing parquet
files.
The rdd we are saving to parquet have a fairly high number of columns (in
the thousands, around 3k for the moment).

The only way we can get rid of this for the moment is by doing a .coalesce
on the SchemaRDD before saving to parquet, but as we get more columns, even
this approach is not working.

Any help is appreciated!

Thanks

Pierre 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SQL-Set-Parquet-block-size-tp16039.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to