Hi there! Is there a way to modify default parquet block size?
I didn't see any reference to ParquetOutputFormat.setBlockSize in Spark code so I was wondering if there was a way to provide this option? I'm asking because we are facing Out of Memory issues when writing parquet files. The rdd we are saving to parquet have a fairly high number of columns (in the thousands, around 3k for the moment). The only way we can get rid of this for the moment is by doing a .coalesce on the SchemaRDD before saving to parquet, but as we get more columns, even this approach is not working. Any help is appreciated! Thanks Pierre -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SQL-Set-Parquet-block-size-tp16039.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org