hi I am trying to save many partitions of Dataframe into one CSV file and it
take forever for large data sets of around 5-6 GB.
sourceFrame.coalesce(1).write().format("com.databricks.spark.csv").option("gzip").save("/path/hadoop")
For small data above code works well but for large data it hangs forever
does not move on because of only one partitions has to shuffle data of GBs
please help me
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/coalesce-1-saveAsTextfile-takes-forever-tp25886.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]