I have only been using spark through the SQL front-end (CLI or JDBC). I don't
think I have access to saveAsParquetFile from there, do I?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-1-0-large-insert-into-parquet-runs-out-of-memory-tp14924p1492
I am trying to load data from csv format into parquet using Spark SQL.
It consistently runs out of memory.
The environment is:
* standalone cluster using HDFS and Hive metastore from HDP2.0
* spark1.1.0
* parquet jar files (v1.5) explicitly added when starting spark-sql.