I have only been using spark through the SQL front-end (CLI or JDBC). I don't
think I have access to saveAsParquetFile from there, do I?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-1-0-large-insert-into-parquet-runs-out-of-memory-tp14924p1492
I would hope that things should work for this kind of workflow.
I'm curious if you have tried using saveAsParquetFile instead of inserting
directly into a hive table (you could still register this as an external
table afterwards). Right now inserting into Hive tables is going to
through their Ser
I am trying to load data from csv format into parquet using Spark SQL.
It consistently runs out of memory.
The environment is:
* standalone cluster using HDFS and Hive metastore from HDP2.0
* spark1.1.0
* parquet jar files (v1.5) explicitly added when starting spark-sql.